Introduction to parsing Swift code with the SwiftSyntax Library
A quick tutorial for parsing Swift code from Swift code.
How would you build a tool to automatically convert Swift code to Javascript? This is a hard problem that becomes even harder/impossible if you use the wrong tools. To do it correctly, we will need to steal some ideas from how compilers work.
The first step is to parse Swift code into an Abstract Syntax Tree. For this task we can use SwiftSyntax — a library that lets you “parse, inspect, generate, and transform Swift source code.”
In this guide I will show you how to use SwiftSyntax to parse code into an AST, and how to pretty-print the AST to see how it works.
My Setup
Here is the setup I’m using:
macOS Version 10.15.6 (19G73)
Xcode Version 11.6 (11E708)
Swift version 5.2.4 (bundled with Xcode)
If you have recently downloaded a new Xcode, make sure to switch to it using:
sudo xcode-select -s "/<Some Path>/Xcode.app/"
If you are using the Xcode 12 beta, this guide may not work for you. These instructions did not work when I tried using Version 12.0 beta 3 (12A8169g)
.
Setting up SwiftSyntax
To start, set up a new project called SwiftCodeAnalyzer
:
mkdir SwiftCodeAnalyzer
cd SwiftCodeAnalyzer
swift package init --type executable
Inside of Package.swift include SwiftSyntax
as a package dependency, and also as a dependecy of the SwiftCodeAnalyzer
target.
// swift-tools-version:5.2
// The swift-tools-version declares the minimum version of Swift required to build this package.
import PackageDescription
let package = Package(
name: "SwiftCodeAnalyzer",
dependencies: [
.package(name: "SwiftSyntax", url: "https://github.com/apple/swift-syntax.git", .exact("0.50200.0")),
],
targets: [
// Targets are the basic building blocks of a package. A target can define a module or a test suite.
// Targets can depend on other targets in this package, and on products in packages this package depends on.
.target(
name: "SwiftCodeAnalyzer",
dependencies: ["SwiftSyntax"]),
.testTarget(
name: "SwiftCodeAnalyzerTests",
dependencies: ["SwiftCodeAnalyzer"]),
]
)
To ensure you have everything set up propertly, try importing SwiftSyntax
, and then use swift run
to run the project.
// Add this code inside of Sources/SwiftCodeAnalyzer/main.swift
import SwiftSyntax
print("SwiftSyntax successfully imported!")
Parsing code with SwiftSyntax
Now that we have successfully imported SwiftSyntax into our project, we can begin using it to parse some Swift code. In this project we will be parsing code stored as a string, but SwiftSyntax also supports parsing code stored in a file too.
Converting code to an AST
Here is how you can can parse code with SwiftSyntax:
import SwiftSyntax
let swiftSource = """
import Foundation
func test() {
print("hello world")
}
test()
"""
let rootNode: SourceFileSyntax = try! SyntaxParser.parse(source: swiftSource)
// We will replace this in the next step.
print(rootNode.description)
The output of SyntaxParser.parse
is the root node of the AST. It contains a list of children nodes, and each child node also contains children. This structure repeats recursively until we hit the leaf nodes. These are typically atomic syntax units like numbers, or variable names.
If you run this code, rootNode.description
will recursively traverse each node in the AST and print out the corresponding string associated with it. The result is a string that precisely matches what is stored inside of swiftSource
. This isnt very interesting since we already know what the source code looks like. Let’s try pretty-sprinting out the AST directly and display the names of each node.
Pretty-Printing the AST
Here is how you can pretty-print the AST:
import SwiftSyntax
let swiftSource = """
import Foundation
func test() {
print("hello world")
}
test()
"""
let rootNode: SourceFileSyntax = try! SyntaxParser.parse(source: swiftSource)
recursivePrint(node: Syntax(rootNode), indent: 0)
func recursivePrint(node: Syntax, indent: Int) {
let indentString = String(repeating: " ", count: indent)
let nodeName = String(describing: node.customMirror.subjectType)
print(indentString + nodeName)
for child in node.children {
recursivePrint(node: child, indent: indent + 1)
}
}
This code recursively traverses the AST and prints out the name of each node. Each level is indented by an additional 2 spaces.
You might notice that the code does something weird here: String(describing: node.customMirror.subjectType)
. This is is a hack and you probably should not use it in production. This was just the simplest way I could get the name of the node for the purposes of this tutorial.
If you run the code, the output should look something like this:
SourceFileSyntax
CodeBlockItemListSyntax
CodeBlockItemSyntax
ImportDeclSyntax
TokenSyntax
AccessPathSyntax
AccessPathComponentSyntax
TokenSyntax
CodeBlockItemSyntax
FunctionDeclSyntax
TokenSyntax
...
Congratulations! You just built a tool that can parse Swift code and visualize its AST.
Explaining how the AST works is out of scope for this guide, but I would encourage you to play changing the source code to see how it affects the AST.
If you want to learn more about SwiftSyntax and how you can use it properly, please take a look at the SyntaxVisitor
class, and also read up on the visitor pattern. If I was writing real production code using SwiftSyntax, this is what I would to parse and analyze the AST of some code. If I get around to writing another article on SwiftSyntax I will likely cover the Visitor pattern and how to use it.