Code search requires a special purpose search engine, and so we’re very pleased that Palamida has been granted a US patent for our algorithm for Massive Multi-Pattern Searching (No. 7,711,719).
The challenge of searching source code as opposed to the web is the fact that it requires a multi-pattern approach because the goal is different. In a web search, the goal is to find sites that contain content defined by a relatively small number of words. In searching source code the goal is to determine if all or part of one program is contained in another. That is best done by breaking the programs up into a massive number of short search terms (source code fingerprints) and searching the second for those fingerprints. By analyzing the result we can determine that your code contains zlib, for example. It’s a huge task, and without a highly optimized way of creating and comparing the search terms, the processing time would be impractically long.
Its evidence again that our category of Software Composition Analysis requires new techniques and approaches and of our commitment to innovation to deliver them.
