January 12, 2024

Questions in Questions

Setting Up
A Challenge
Enter: LISP
Powering Down

When someone asks a true/false question like "Is this book any good?", it is almost never atomic; it can usually be broken down into subquestions.

is the narrative voice persuading?
does that asker enjoy the subject material?
is the story compelling?
...

The combination of answers to each of these questions yields the answer to "Is this book any good?". Each of these sub-questions is also known as a predicate: a statement which is true or false.

If someone gave you two predicates (or more generally, a collection of them), how would you go about combining them?

If I gave you one predicate composed of many sub-predicates, could you switch out these sub-predicates for some other ones? For instance, what if the reader changed their mind about needing a strong narrative voice, and decided that they didn't want to consider that as a factor at all?

Setting Up

A predicate is a function of the form T -> bool. It takes some object and tests whether or not it satisfies some boolean condition.

    Predicate = Callable[[T], bool]
    pos_len: Predicate = lambda x: len(x) > 0 
    a_prefixed: Predicate = lambda x: x.startswith("a")

We'll restrict to predicates of a fixed type T, since it is pretty pointless to ask questions about properties of a thing that doesn't have that thing. (ex: the predicate pos_len applied to an 5 doesn't even make sense, since integers have no intrinsic concept of length).

You may have learned in school that you can:

and two predicates
or two predicates
not a single predicate

    def true(x): 
        # the trivial predicate
        return True

    def and_(p1: Predicate = true, p2: Predicate = true) -> Predicate: 
        # values are defaulted to True so that called this with a single argument can still 
        # produce a coherent answer
        return lambda x: p1(x) and p2(x)

    def or_(p1: Predicate = true, p2: Predicate = true) -> Predicate: 
        return lambda x: p1(x) or p2(x)

    def not_(p: Predicate) -> Predicate: 
        return lambda x: not p(x)

Since we're modeling each predicate as a function, how do we usually combine functions to build more complicated ones?

One typical way is function composition.

    from itertools import chain
    from functools import reduce

    def comp(f, *fns):
        # assumes unary fns
        # composes from the outside in: comp(f, g, h)(x) == f(g(h(x)))
        rfs = list(chain([f], fns))
        rfs.reverse()
        def compd(*args, **kwargs):
            return reduce(
                lambda result, fn: fn(result),
                rfs[1:],
                rfs[0](*args, **kwargs))

        return comp

But you can't just compose predicates like normal math functions. Their type signatures don't match!

You can't compose uses of and_ and or_ together either, because they both take two predicates and only return one!

Getting something composable means that we want something with a signature like Predicate -> Predicate; a gadget that takes a predicate and returns a new predicate, but modified in some way. Then we could compose to our heart's content to build complex predicates out of simple ones!

An immediate example we can see is partially applying one of and_, or_.

    from functools import partial

    and_pos_length = partial(and_, pos_len)
    # Predicate -> Predicate

Now we have our hands on one of these things. Let's call it a predicate transformer. This particular one takes in some predicate, and constrains it with an additional additional predicate.

We can do a similar thing with or_:

    or_pos_length = partial(or_, pos_len)
    # Predicate -> Predicate

This takes some predicate P and loosens it such that P can be True or the item in question can have positive length.

Now we can freely comp these:

    and_a_prefixed = partial(and_, a_prefixed)
    and_b_suffixed = partial(and_, lambda x: x.endswith("b"))
    a_and_pos_length_and_b_er = comp(and_pos_length, and_a_prefixed, and_b_suffixed)
    # type: Predicate -> Predicate

In order to actually evaluate the base predicate represented by a_and_pos_length_and_b_er on an input string, we need to pass it a starting predicate in order to get the predicate we want to evaluate. In this case, we are only and-ing predicates together, so we should feed the predicate transformer the trivial predicate true that we defined earlier:

    a_and_pos_length_and_b(true)("ab") # True
    a_and_pos_length_and_b(true)("abc") # False
    a_and_pos_length_and_b(true)("bb") # False
    a_and_pos_length_and_b(true)("acb") # True
    a_and_pos_length_and_b(true)(7) # Undefined, throws exception due to type mismatch

When applied to another non-trivial predicate P like lambda x: "dog" in x, a_and_pos_length_and_b_er tacks on the additonal constrains to P specified by its component predicates! Hence the _er on the function name.

These predicate transformers are secretly an instance of the Decorator pattern from your OO classes in college, even if they don't seem that way at first.

When I was taught this pattern, I learned it as something you do to classes. The purpose was to "decorate" a class with additional behavior.

However, decorators can be applied to a much more primitive concept in programming: functions! A decorator function is a higher-order function that takes a given type of function as input, adds some additional behavior to that function (perhaps by pre-processing the input, post-processing the output, logging the function's result, etc.), and then returns an altered function as output. In our case, the altered function is of the same type (Predicate), but this need not be the case.

Our predicate transformers fit right into this pattern by "decorating" a predicate with some additional constraints (or loosenings of current constraints).

A Challenge

Try writing a predicate transformer called drop that takes a compound predicate P and a predicate transformer to remove, and removes that predicate transformer from P.

For example:

    # Ex: 
    P = lambda x: x.name == "Zola" and x.job == "Farmer"
    named_zola = partial(and_, lambda x: x.name == "Zola")
    is_farmer = partial(and_, lambda x: x.job == "Farmer")

    # or expressed as a composition with our predicate transformers: 
    P = comp(named_zola, is_farmer)(true)

    def drop(p: Predicate, p_to_remove) -> Predicate: 
        ...

    # desired result of drop(P, is_farmer) is a predicate transformer that behaves like `named_zola`.

If this felt really hard, fear not! You actually can't simply "remove" a predicate like this once it's and-ed with another predicate. The main reason we can't do this is because we're only really able to interact with the composite predicate via operations like and, or, and not. We can't access the individual predicates + operations that make up the parent predicate; we can only tack stuff on after the fact.

What drop is leading us to do is consider our predicate as a data structure that we can manipulate (i.e. remove some elements from it in a sensible way).

So, let's reconsider our model of a composite predicate. We have so far represented this as a composite function built using our predicate transformers, but we have now hit its limitations. The intended behavior of the drop function suggests that we should model our predicate as a list somehow so that we can get at:

The sub-predicates which make up the composite
The operation used to glue all these subpredicates together (ands, ors, and nots.)

There already exists a family of programming languages that achieves exactly what we're looking for: representing code as data.

Enter: LISP

LISP (LISt Processor) was invented by John McCarthy back in ~1960. It is a language whose code is written as a data structure: a series of nested lists termed s-expressions.

An s-expression is either an atom (like 5 or cat) or a list (x y) where x, y are s-expressions. The first element in the list is assumed to be a function, and the rest of the items are its arguments. For example, here's how a compound predicate might look in a modern LISP, Clojure:

(defn my-pred [x] (and (zero? x) (or (even? x) (pos? x))))
;; this defines a function `my-pred` that takes an argument x and asks: is x zero, and is x positive or even? if so, return true, else false.

Note: I'll have to do something on the history of LISP bc it is an incredibly influential and beautiful language. For example, it introduced the notion of garbage collection, aka automatic memory management!

Let's take a page out of LISP's book and try representing our predicates as nested lists, where the first item in each (sub)list is the boolean operator we are using to glue the predicates together.

    # Here's that same expression, but in our new lispy thing!
    def even(x): return x % 2 == 0
    def pos(x):  return x > 0
    def zero(x): return x == 0 

    p = [and_, zero, [or_, even, pos]]

But now the question is: how do we actually evaluate our new representation of our compound predicate p on some input, say 5?

We need to write a mini interpreter! Luckily, because of the uniform syntax of LISPs, this is suprisingly easy (compared to thinking about how you would write an interpreter for pretty much any other language syntax out there).

Try it yourself before looking at a solution below.

Reveal!

```python

    def eval(expr): 
        if not isinstance(expr, list): 
        # we've hit something that is either a function or a plain value
            return expr
        else: 
            if len(expr) == 0: 
                raise SyntaxError("the predicate expression must have at least 1 element")
            if len(expr) == 1: 
                if callable(expr[0]): 
                    raise SyntaxError(f"Please provide an argument to {expr[0]}")
                raise SyntaxError(f"Expected function as the first argument. Got {expr[0]}, which is of type {type(expr[0])}")
            if len(expr) == 2: 
                if not callable(expr[0]): 
                    raise SyntaxError(f"Expected function as the first argument. Got {expr[0]}, which is of type {type(expr[0])}")
                return expr[0](eval(expr[1]))
            if len(expr) == 3: 
                if not callable(expr[0]): 
                    raise SyntaxError(f"Expected function as the first argument. Got {expr[0]}, which is of type {type(expr[0])}")
                return expr[0](eval(expr[1]), eval(expr[2]))
            raise SyntaxError("Functions that take more than two arguments are not supported")
```

When evaluated on p, this yields a predicate that does exactly what we want!

    print([(i, eval(p)(i)) for i in range(-10, 10)])
    
    # [(-10, False),
    # (-9, False),
    # (-8, False),
    # (-7, False),
    # (-6, False),
    # (-5, False),
    # (-4, False),
    # (-3, False),
    # (-2, False),
    # (-1, False),
    # (0, True), <<- what we expected from the predicate!
    # (1, False),
    # (2, False),
    # (3, False),
    # (4, False),
    # (5, False),
    # (6, False),
    # (7, False),
    # (8, False),
    # (9, False)]

The real upshot of this representation is that it is now pretty straightforward to define drop and other functions like it:

    def drop(expr, p_to_remove): 
        if callable(expr[0]): 
            if expr[0].__name__ == p_to_remove.__name__: 
                return expr[1:]
            return [expr[0]] + drop(expr[1:], p_to_remove)
        return [expr[0]] +  drop(expr[1:], p_to_remove)

    dropped_zero = drop(p, zero)
    print(dropped_zero)
    # [
    #     <function __main__.and_(p1, p2)>,
    #     [
    #         <function __main__.or_(p1, p2)>,
    #         <function __main__.even(x)>,
    #         <function __main__.pos(x)>
    #     ]
    # ]

    print([(i, eval(dropped_zero)(i)) for i in range(-10, 10)]

    # [(-10, True), --> you can be even (but not positive)
    # (-9, False),
    # (-8, True),
    # (-7, False),
    # (-6, True),
    # (-5, False),
    # (-4, True),
    # (-3, False),
    # (-2, True), 
    # (-1, False),
    # (0, True), 
    # (1, True), --> you can be positive (but not even)
    # (2, True), --> you can be even (and positive)
    # (3, True), 
    # (4, True),
    # (5, True),
    # (6, True),
    # (7, True),
    # (8, True),
    # (9, True)]


    # Nice :D

Powering Down

To bring this full circle, let's summarize.

We started out wanting to combine/manipulate predicates in a flexible way.
We assumed that since a predicate acts like a function, we have to model it using only functions.
- We used predicate transformers as a composable abstraction for building complex predicates
- We ran up against an inability to access sub-predicates in a composite predicate, because all the sub-predicates were hidden in the parent function.
Then we turned to creating a data structure that could explicitly model the compositional structure in our predicates.
- This (plus our eval method) allowed us to write a predicate transformer like drop, which wasn't possible before.

My major takeaways are:

Function composition is a one-way operation.
Where we ended up demonstrates the power of modeling domain concepts as data structures (rather than procedures/functions/etc).

That's it for now!! Next time, I'll talk about a particular predicate transformer that can give a semantics for a programming language called GCL.

Notes:

The way I've defined the interpreter for this LISP is not as general or extensible as it could be. For a more complete toy LISP designed to work with things other than predicates (i.e. lambdas, variables, etc.), I would highly recommend checking out this awesome blog post from Michael Nielsen.

Tags: higher-order functions LISP

« Going on a tangent: taking the derivative of a regular expression First blog post! »