With growing the malware there is an approach called Shared code analysis, or similarity analysis, that will save tons of reverse engineering work for malware researchers.
Shared code analysis is an approach to comparing two malware samples by estimating the percentage of precompilation source code they share.
There are four measures to identify similarity between malware samples:
1-instruction sequence based similarity (x86 Assembly instructions). 2-String based similarity .
3- IAT based similarity.
4- Dynamic API Call based similarity (you can collect malicious API Calls from logs) .
Benefits of shared code analysis approach:
-Determine a new malware sample’s code similarity to thousands of previously seen malware samples,
-Identify new malware families based on sharing code.
-Visualize malware relationships to know the most common techniques that threat actors use (this benefit is important in building malware detector based ML).
-Replacement for manual reverse engineering work.
How does shared code analysis work?
You can identify the similarity between malware using "Jaccard index."
Jaccard index: compares members for two sets to see which members are shared and which are distinct. It's a measure of similarity for the two sets of data, with a range from 0% to 100%.
To identify the similarity using Jaccard index use the following equation
J= (AB) /(AB) *100
For example: if you have two sets A= {1,2,3,4,5}, B= {2,9,8,7,10,5}
You can find the similarity between the two sets by Jaccard index:
(AB) /(AB) = (2/9) *100=22.2%
22.2% means the percentage of similarity between the two sets .
“To scale malware similarity comparisons, we need to use randomized comparison approximation algorithms.
known as minhash serves this purpose beautifully. The minhash method allows us to compute the Jaccard index using approximation to avoid computing similarities between non-similar malware samples."
The references :
For more details about similarity use the Jaccard index and MinHAsh algorithm.
You can visit the following link
https://lnkd.in/dRPEKhaU
The code that implements this approach(jaccard index, MInHash Algorithm) of similarity exists in malware data science book CHP5.
Malware Data Science Book:
https://lnkd.in/dnTyRk2X
Comments
Post a Comment