Building a Programming Language in Ruby: The Interpreter, Part 2 | DevsDay.ru

IT-блоги Building a Programming Language in Ruby: The Interpreter, Part 2

Honeybadger Developer Blog 9 августа 2021 г. Alex Braha Stoll


Full Source on Github

A complete implementation of the Stoffle programming language is available on GitHub. Feel free to open an issue if you find bugs or have questions.

In this blog post, we will continue implementing the interpreter for Stoffle, a toy programming language built entirely in Ruby. We started the interpreter in a previous post. You can read more about this project in the first part of this series.

In the last post, we covered how to implement the simpler features of Stoffle: variables, conditionals, unary and binary operators, data types, and printing to the console. Now, it is time to roll up our sleeves and tackle the more challenging remaining bits: function definition, function calling, variable scoping, and loops.

As we did previously, we will use the same example program from the start to the end of this post. We will go through it line by line, exploring the implementation necessary at the interpreter to bring to life each different structure in our Stoffle example program. Finally, we will see the interpreter in action and run the program by using the CLI we created in the previous article of the series.

Gauss is Back

If you have a good memory, you can probably remember that in part two of the series, we discussed how to build a Lexer. In that post, we took a look at a program to sum up numbers in a series to illustrate Stoffle's syntax. At the end of this article, we will finally be able to run the aforementioned program! So, here is the program again:

fn sum_integers: first_integer, last_integer
  i = first_integer
  sum = 0
  while i <= last_integer
    sum = sum + i

    i = i + 1
  end

  println(sum)
end

sum_integers(1, 100)

The abstract syntax tree (AST) for our integer summation program is the following:

Sample Program's AST

The Mathematician Who Inspired our Stoffle Sample Program

Carl Friedrich Gauss supposedly figured out on his own, at only 7 years old, a formula to sum up numbers in a series.

As you may have noticed, our program does not use the formula devised by Gauss. Since we have computers nowadays, we have the luxury of solving this problem in a "brute-force" way. Let our silicon friends do the hard work for us.

Function Definition

The first thing we do in our program is define the sum_integers function. What does it mean to declare a function? As you might be guessing, it is an action similar to assigning a value to a variable. When we define a function, we are associating a name (i.e., the function name, an identifier) with one or more expressions (i.e., the function's body). We also register to which names the values passed in during the function call should be bound. These identifiers become local variables during the execution of the function and are called parameters. The values passed in when the function is called (and are bound to the parameters) are arguments.

Let's take a look at #interpret_function_definition:

def interpret_function_definition(fn_def)
  env[fn_def.function_name_as_str] = fn_def
end

Pretty straightforward, huh? As you probably remember from the last post in this series, when our interpreter gets instantiated, we create an environment. This is a place used to hold the program state, and in our case, it is simply a Ruby hash. In the last post, we saw how variables and the values bound to them are stored in the env. Function definitions will also be stored there. The key is the function name, and the value is the AST node used to define a function (Stoffle::AST::FunctionDefinition). Here's a refresher on this AST node:

class Stoffle::AST::FunctionDefinition < Stoffle::AST::Expression
  attr_accessor :name, :params, :body

  def initialize(fn_name = nil, fn_params = [], fn_body = nil)
    @name = fn_name
    @params = fn_params
    @body = fn_body
  end

  def function_name_as_str
    # The instance variable @name is an AST::Identifier.
    name.name
  end

  def ==(other)
    children == other&.children
  end

  def children
    [name, params, body]
  end
end

Having the function name associated with a Stoffle::AST::FunctionDefinition means that we can access all the information needed to execute the function. For example, we have on hand the number of arguments expected and can easily emit an error if a function call does not provide it. This and other details we will see when we explore, next, the code responsible for interpreting a function call.

Calling a Function

Continuing the walk-through of our example, let us now focus on the function call. After defining the sum_integers function, we call it passing the numbers 1 and 100 as arguments:

fn sum_integers: first_integer, last_integer
  i = first_integer
  sum = 0
  while i <= last_integer
    sum = sum + i

    i = i + 1
  end

  println(sum)
end

sum_integers(1, 100)

The interpretation of a function call happens at #interpret_function_call:

def interpret_function_call(fn_call)
  return if println(fn_call)

  fn_def = fetch_function_definition(fn_call.function_name_as_str)

  stack_frame = Stoffle::Runtime::StackFrame.new(fn_def, fn_call)

  assign_function_args_to_params(stack_frame)

  # Executing the function body.
  call_stack << stack_frame
  value = interpret_nodes(fn_def.body.expressions)
  call_stack.pop
  value
end

This is a complex function, so we will need to take our time here. As explained in the last article, the first line is responsible for checking whether the function getting called is println. If we are dealing with a user-defined function, which is the case here, we move on and fetch its definition using #fetch_function_definition. As shown below, this function is a plain sail, and we basically retrieve the Stoffle::AST::FunctionDefinition AST node we previously stored in the environment or emit an error if the function does not exist.

def fetch_function_definition(fn_name)
  fn_def = env[fn_name]
  raise Stoffle::Error::Runtime::UndefinedFunction.new(fn_name) if fn_def.nil?

  fn_def
end

Returning to #interpret_function_call, things start to get more interesting. When thinking about functions in our simple toy language, we have two special concerns. First, we need a strategy to keep track of variables local to the function. We also have to handle return expressions. To tackle these challenges, we will instantiate a new object, which we will call frame, every time a function is called. Even if the same function is called multiple times, each new call will instantiate a new frame. This object will hold all the variables local to the function. Since one function can call another one and so on, we must have a way to represent and keep track of the execution flow of our program. To do so, we will use a stack data structure, which we will name call stack. In Ruby, a standard array with its #push and #pop methods will do as a stack implementation.

Call Stack and Stack Frame

Keep in mind that we are using the terms call stack and stack frame loosely. Processors and lower-level programming languages generally will also have call stacks and stack frames, but they are not exactly what we have here in our toy language.

If these concepts piqued your curiosity, I highly recommend researching call stacks and stack frames. If you really want to get closer to the metal, I would suggest specifically looking into processor call stacks.

Here is the code for implementing a Stoffle::Runtime::StackFrame:

module Stoffle
  module Runtime
    class StackFrame
      attr_reader :fn_def, :fn_call, :env

      def initialize(fn_def_ast, fn_call_ast)
        @fn_def = fn_def_ast
        @fn_call = fn_call_ast
        @env = {}
      end
    end
  end
end

Now, back to #interpret_function_call, the next step is to assign the values passed in the function call to the respective expected parameters, which will be accessible as local variables inside the function body. #assign_function_args_to_params is responsible for this step:

def assign_function_args_to_params(stack_frame)
  fn_def = stack_frame.fn_def
  fn_call = stack_frame.fn_call

  given = fn_call.args.length
  expected = fn_def.params.length
  if given != expected
    raise Stoffle::Error::Runtime::WrongNumArg.new(fn_def.function_name_as_str, given, expected)
  end

  # Applying the values passed in this particular function call to the respective defined parameters.
  if fn_def.params != nil
    fn_def.params.each_with_index do |param, i|
      if env.has_key?(param.name)
        # A global variable is already defined. We assign the passed in value to it.
        env[param.name] = interpret_node(fn_call.args[i])
      else
        # A global variable with the same name doesn't exist. We create a new local variable.
        stack_frame.env[param.name] = interpret_node(fn_call.args[i])
      end
    end
  end
end

Before we explore #assign_function_args_to_params implementation, it is first necessary to briefly discuss variable scoping. This is a complex and nuanced subject. For Stoffle, let us be very pragmatic and adopt a simple solution. In our tiny language, the only constructs that create new scopes are functions. Furthermore, global variables always come first. As a consequence, all variables declared (i.e., first usage) outside a function are global and are stored in env. Variables inside functions are local to them and are stored in the env of the stack frame created during the interpretation of the function call. There is one exception, though: a variable name that collides with an existing global variable. If a collision occurs, a local variable will not be created, and we will be reading and assigning to the existing global variable.

Okay, now that our variable scoping strategy is clear, let’s return to #assign_function_args_to_params. In the first segment of the method, we first retrieve the function definition and function call nodes from the stack frame object that was passed in. Having these on hand, it is easy to check whether the number of arguments provided match the number of parameters the function being called expects. We raise an error when there is a mismatch between the given arguments and the expected parameters. In the last part of #assign_function_args_to_params, we assign the arguments (i.e., the values) provided during the function call to their respective parameters (i.e., local variables inside the function). Note that we do check whether a parameter name collides with an existing global variable. As explained before, in these cases, we do not create a local variable inside the function's stack frame and instead just apply the passed in value to the existing global variable.

Returning to #interpret_function_call, we finally push our newly created stack frame to the call stack. Then, we call our old friend #interpret_nodes to start interpreting the function body:

def interpret_function_call(fn_call)
  return if println(fn_call)

  fn_def = fetch_function_definition(fn_call.function_name_as_str)

  stack_frame = Stoffle::Runtime::StackFrame.new(fn_def, fn_call)

  assign_function_args_to_params(stack_frame)

  # Executing the function body.
  call_stack << stack_frame
  value = interpret_nodes(fn_def.body.expressions)
  call_stack.pop
  value
end

Interpreting the Function Body

Now that we have interpreted the function call itself, it’s time to interpret the function body:

fn sum_integers: first_integer, last_integer
  i = first_integer
  sum = 0
  while i <= last_integer
    sum = sum + i

    i = i + 1
  end

  println(sum)
end

sum_integers(1, 100)

The first two lines of our sum_integers function are variable assignments. We covered this topic in the previous blog post of this series. However, we now have variable scoping, and as a consequence, the code that deals with assignment has changed a bit. Let us explore it:

def interpret_var_binding(var_binding)
  if call_stack.length > 0
    # We are inside a function. If the name points to a global var, we assign the value to it.
    # Otherwise, we create and / or assign to a local var.
    if env.has_key?(var_binding.var_name_as_str)
      env[var_binding.var_name_as_str] = interpret_node(var_binding.right)
    else
      call_stack.last.env[var_binding.var_name_as_str] = interpret_node(var_binding.right)
    end
  else
    # We are not inside a function. Therefore, we create and / or assign to a global var.
    env[var_binding.var_name_as_str] = interpret_node(var_binding.right)
  end
end

Do you remember when we pushed the stack frame created for the function call into call_stack? This comes handy now since we can check whether we are inside a function by verifying that call_stack has a length greater than zero (i.e., has at least one stack frame). If we are inside a function, which is the case in the code we are currently interpreting, we check whether we already have a global variable with the same name of the variable to which we are now trying to bind a value. As you already know, if a collision happens, we will simply assign the value to the existing global variable, and a local one will not be created. When the name is not being used, we create a new local variable and assign the intended value to it. Since call_stack is a stack (i.e., last in first out data structure), we know that this local variable should be stored in the env of the last stacked frame (i.e., the frame created for the function currently being processed). Finally, the last part of #interpret_var_binding deals with assignments happening outside functions. Since only functions create new scopes in Stoffle, nothing changes here, as variables created outside functions are always global and stored at the instance variable env.

Returning to our program, the next step is to interpret the loop responsible for summing up the integers. Let us refresh our memory and take a look at the AST of our Stoffle program again:

Sample Program's AST

The node representing the loop is Stoffle::AST::Repetition:

class Stoffle::AST::Repetition < Stoffle::AST::Expression
  attr_accessor :condition, :block

  def initialize(cond_expr = nil, repetition_block = nil)
    @condition = cond_expr
    @block = repetition_block
  end

  def ==(other)
    children == other&.children
  end

  def children
    [condition, block]
  end
end

Note that this AST node basically brings together concepts we have explored in previous articles. For its conditional, we will have an expression that will generally have at its root (think about the expression's AST root node) a Stoffle::AST::BinaryOperator (e.g., '>', 'or' and so on). For the body of the loop, we will have an Stoffle::AST::Block. This makes sense, right? The most basic form of a loop is one or more expressions (a block) to be repeated while an expression is truthy (i.e., while the conditional evaluates to a truthy value).

The corresponding method at our interpreter is #interpret_repetition:

def interpret_repetition(repetition)
  while interpret_node(repetition.condition)
    interpret_nodes(repetition.block.expressions)
  end
end

Here, you may be surprised by the simplicity (and, dare I say, beauty) of this method. We can implement the interpretation of loops by combining methods that we have already explored in past articles. By using Ruby's while loop, we can make sure that we continue interpreting the nodes that compose our Stoffle loop (by repeatedly calling #interpret_nodes) while the evaluation of the conditional is true. The job of evaluating the conditional is as easy as calling the usual suspect, the #interpret_node method.

Returning from the Function

We are almost at the finish line! After the loop, we proceed and print out the result of the summation to the console. We are not going through it again since we already covered it in the last part of the series. As a quick recap, remember that the println function is provided by Stoffle itself and, internally in the interpreter, we are simply using Ruby's own puts method.

To finish this post, we have to revisit #interpret_nodes. Its final version is a bit different from the one we saw in times past. Now, it includes code to handle returning from a function and unwinding the call stack. Here is the completed version of #interpret_nodes in its full glory:

def interpret_nodes(nodes)
  last_value = nil

  nodes.each do |node|
    last_value = interpret_node(node)

    if return_detected?(node)
      raise Stoffle::Error::Runtime::UnexpectedReturn unless call_stack.length > 0

      self.unwind_call_stack = call_stack.length # We store the current stack level to know when to stop returning.
      return last_value
    end

    if unwind_call_stack == call_stack.length
      # We are still inside a function that returned, so we keep on bubbling up from its structures (e.g., conditionals, loops etc).
      return last_value
    elsif unwind_call_stack > call_stack.length
      # We returned from the function, so we reset the "unwind indicator".
      self.unwind_call_stack = -1
    end
  end

  last_value
end

As you already know, #interpret_nodes is used every time we need to interpret a bunch of expressions. It is used to start interpreting our program and on every occasion when we encounter nodes that have a block associated with them (such as Stoffle::AST::FunctionDefinition). Specifically, when dealing with functions, there are two scenarios: interpreting a function and hitting a return expression or interpreting a function to its end and not hitting any return expressions. In the second case, it means either the function does not have any explicit return expressions or the code path that we went through did not have a return.

Let us refresh our memories before continuing. As you probably remember from a couple paragraphs above, #interpret_nodes was called when we started interpreting the sum_integers function (i.e., when it got called in our program). Again, here is the source code of the program we are going through:

fn sum_integers: first_integer, last_integer
  i = first_integer
  sum = 0
  while i <= last_integer
    sum = sum + i

    i = i + 1
  end

  println(sum)
end

sum_integers(1, 100)

We are at the end of interpreting the function. As you might be guessing, our function does not possess an explicit return. This is the easiest path of #interpret_nodes. We basically iterate through all the function nodes, returning the value of the last interpreted expression at the end (quick reminder: Stoffle has implicit returns). This takes us to the finish line, concluding the interpretation of our program.

Although our program has now been fully interpreted, the main purpose of this article is to explain the implementation of the interpreter, so let’s take a bit more time here and see how the interpreter deals with cases in which we do hit a return inside a function.

First, the return expression gets evaluated at the beginning of the operation. Its value will be the evaluation of what is being returned. Here is the code for Stoffle::AST::Return:

class Stoffle::AST::Return < Stoffle::AST::Expression
  attr_accessor :expression

  def initialize(expr)
    @expression = expr
  end

  def ==(other)
    children == other&.children
  end

  def children
    [expression]
  end
end

Then, we have a simple conditional that will detect return AST nodes. Having done this, we first perform a sanity check to verify that we are inside a function. To do so, we can simply check the length of the call stack. A length greater than zero means we are indeed inside a function. In Stoffle, we do not allow the use of return expressions outside functions, so we raise an error if this check fails. Before returning the value intended by the programmer, we first keep record of the current length of the call stack, storing it at the instance variable unwind_call_stack. To understand why this is important, let’s review #interpret_function_call:

def interpret_function_call(fn_call)
  return if println(fn_call)

  fn_def = fetch_function_definition(fn_call.function_name_as_str)

  stack_frame = Stoffle::Runtime::StackFrame.new(fn_def, fn_call)

  assign_function_args_to_params(stack_frame)

  # Executing the function body.
  call_stack << stack_frame
  value = interpret_nodes(fn_def.body.expressions)
  call_stack.pop
  value
end

Here, at the end of #interpret_function_call, note that we pop the stack frame from the call stack after interpreting the function. As a consequence, the length of the call stack will be reduced by one. Since we have preserved the length of the stack at the moment we detected the return, we can compare this initial length every time we interpret a new node at #interpret_nodes. Let’s take a look at the segment that does this inside the node iterator of #interpret_nodes:

def interpret_nodes(nodes)
  # ...

  nodes.each do |node|
    # ...

    if unwind_call_stack == call_stack.length
      # We are still inside a function that returned, so we keep on bubbling up from its structures (e.g., conditionals, loops etc).
      return last_value
    elsif unwind_call_stack > call_stack.length
      # We returned from the function, so we reset the "unwind indicator".
      self.unwind_call_stack = -1
    end

    # ...
  end

  # ...
end

This may be a bit difficult to grasp at first. I encourage you to check the full implementation on GitHub and play around with it if you think it may help you understand this last bit of the interpreter. The crucial point to have in mind here is that a typical program has many deeply nested structures. Ergo, executing #interpret_nodes generally results in a new call to #interpret_nodes, which may result in more calls to #interpret_nodes and so on! When we hit a return inside a function, we may be inside a deeply nested structure. Imagine, for example, that the return is inside a conditional that is part of a loop. To return from the function, we have to return from all #interpret_nodes calls until we return from the one initiated by #interpret_function_call (i.e., the call to #interpret_nodes that started the interpretation of the function body).

At the segment of code above, we highlight exactly how we do this. By having a positive value at @unwind_call_stack and one that is equal to the current length of the call stack, we know for certain that we are inside a function and that we still did not return from the original call initiated by #interpret_function_call. When this finally happens, @unwind_call_stack will be greater than the current length of the call stack; thus, we know that we have exited the function that returned, and we no longer need to continue the process of bubbling up. Then, we reset @unwind_call_stack. Just to make the use of @unwind_call_stack crystal clear, here are its possible values:

  • -1, its initial and neutral value, indicating that we are not inside any functions that returned.
  • positive value equal to the call stack length, indicating that we are still inside a function that returned.
  • positive value greater than the call stack length, indicating that we are no longer inside the function that returned.

Running our Program Using the Stoffle CLI

In the previous article of the series, we created a simple CLI to make interpreting Stoffle programs easier. Now that we have explored the interpreter's implementation, let’s see it in action, running our program. As shown above in many different sections, our code defines and then calls the sum_integers function passing the arguments 1 and 100. If our interpreter is working properly, we should see 5050.0 (the sum of the set of integers starting at 1 and ending at 100) printed to the console:

Sum Integers Function Running

Closing Thoughts

In this post, we implemented the final parts needed to complete our interpreter. We tackled function definition, function calling, variable scoping, and loops. Now, we have a simple but working programming language!

In the next and final part of this series, I will share some resources that I consider great options for those that want to continue their programming language implementation studies. I will also propose some challenges you can take on to continue your learning while improving your version of Stoffle. See you later; ciao!

Источник: Honeybadger Developer Blog

Наш сайт является информационным посредником. Сообщить о нарушении авторских прав.