Editor’s note: This is a cross-post written by Developer Experience Engineer, Marco Ippolito. Marco has his own blog at Medium.

Learn how to build a powerful code search tool using Orama

Searching for specific pieces of code in large JavaScript files can be a daunting task, especially when dealing with complex projects, as developers, we’ve all experienced the frustration of sifting through hundreds or even thousands of lines of code, trying to locate that elusive function or variable.

Traditional text editors or IDEs may offer basic search functionalities, but they often fall short of providing accurate and contextually relevant results. The main issue lies in their inability to specify where a variable is declared or to differentiate between different variable types like “const”, “let”, “function,” or class properties. This limitation leads to wasted time and effort during the code navigation process.

In this article, we’ll learn how to build a powerful code search tool using Orama, an incredible full-text search engine.

Abstract Syntax Tree

When searching for specific code elements within JavaScript files, relying solely on string occurrences is insufficient. The reason is that JavaScript code can be written in various styles and can include complex structures that go beyond simple text patterns. To perform precise and accurate searches, we need a more sophisticated approach, which is where the Abstract Syntax Tree becomes essential.

An Abstract Syntax Tree is a powerful representation of a code file’s syntax and structure. It breaks down the code into a hierarchical tree of nodes, each representing a distinct syntactic element, such as functions, variables, expressions, loops, and more. This tree structure allows us to analyze the code’s context, relationships, and scope, providing a more accurate understanding of its components.

This code snippet showcases the transformation of the code into an Abstract Syntax Tree:

Copy to Clipboard

Here’s the AST representation:

Copy to Clipboard

The Abstract Syntax Tree may seem overwhelming, as even a tiny code snippet creates a large and intricate JSON representation. But don’t worry; we’ll simplify it, making it easy to work with and explore its potential.

Let’s get started

First, we’ll read the content of the file we want to search in using the node:fs module. Then, we’ll leverage Acorn, a widely-used JavaScript library, to generate the AST:

Copy to Clipboard

Once we generate the AST, we have a complex JavaScript object representing the code’s structure. To simplify navigation and analysis, we flatten the AST into an array, reducing complexity.

Before we proceed with the flattening process, due to the hierarchical nature of the AST, we focus our search on child nodes within specific properties. The fieldsToTraverse array includes the properties where we want to explore and search for further child nodes:

Copy to Clipboard

We also want a list of fields to pick from each node. The fieldsToPick array includes the properties we are interested in.

Copy to Clipboard

Now, we can create a function that traverses the AST to flatten it. This function recursively navigates through the hierarchical structure of the AST and converts it into a flat array representation. As it traverses the tree, it identifies nodes of interest (fieldsToTraverse) and extracts relevant information based on the specified properties (fieldsToPick).

The flattened array is then organized to preserve the relationships between nodes, making it easier to access and search for specific code elements.

Copy to Clipboard

We have created an array of objects that looks like this:

Copy to Clipboard

We have successfully traversed and flattened the Abstract Syntax Tree. Each entry represents a specific piece of the JavaScript code. These entries correspond to individual nodes within the code’s Abstract Syntax Tree. Each node is associated with its parent node through the parentType, parentId, indicating the hierarchical relationship between nodes.

The additional properties like name, kind, value, type, and loc provide valuable information about each code element. For instance, name might represent the name of a variable or function, kind could indicate the type of declaration (e.g., “const” or “let“), and value may contain the value assigned to a variable. The type property specifies the type of node (e.g., “Identifier“, “Literal” , “MemberExpression,” etc.), and loc represents the source code location of the node.

Now we want to be able to search and analyze the entries based on their properties, identify specific code patterns, locate variable declarations, and analyze function calls.

Orama: the lightning-fast search engine

Orama is a fast, batteries-included, full-text search engine entirely written in TypeScript, with zero dependencies.

By providing our array of AST nodes to Orama, we gain the ability to execute complex queries on the data.

Let’s install Orama:

Copy to Clipboard

First of all, we create an Orama database by defining the structure of our data:

Copy to Clipboard

Then we insert our nodes inside the database:

Copy to Clipboard

And… that’s it! Once we’ve added our AST nodes to Orama, searching for specific information becomes straightforward!

Let’s use a more complex code snippet to perform our queries on:

Copy to Clipboard

The code snippet contains several variables with repeated names and different scopes, which can lead to confusion when searching using traditional search filters.

For instance, the variable sayHello is defined both as a property of the class SayHello, a parameter of its constructor, and a function, which can make it hard to track its usage and assignments.

Similarly, the variable greet is declared multiple times, both as a local variable inside the function sayHello and as a global variable outside the function.

Let’s try to use Orama to search the SayHello class declaration, by filtering for parentType:

Copy to Clipboard

Output:

Copy to Clipboard

Acorn provides us with the field loc, (location) within the AST nodes, this information allows us to determine precisely where each token appears in the source file.

We can also search for a variable specifying if we are looking for a “let” or a “const”:

Copy to Clipboard

Output:

Copy to Clipboard

Conclusion

This was a fun experiment where we used Orama and Acorn together to make it easier to explore and understand complex code. The combination of their features worked well, and it would be cool to create a plugin for Visual Studio Code using this system. With such a plugin, we could improve code search, analysis, and navigation, making coding a lot more efficient and enjoyable for developers.

You can download the source code from https://github.com/marco-ippolito/orama-ast.

Share Me

Related Reading

Newsletter

Don’t miss a beat

Get all the latest NearForm news, from technology to design. Sign up for our newsletter.

Follow us for more information on this and other topics.