Resumé-Driven Development Part 3: Dgraph because it sounds cool : Resumé-Driven Development

So last time we started building our app as a GraphQL API with some basic user endpoints. But it had no database, which was fine for demonstration purposes, though none of the user data that we put into the system was actually being stored. We need a database to store our precious data, so that we can mine it for nefarious business purposes later on.

But this tutorial won't cover any of the various ways in which companies mine your data for $$$. If I knew how to do that, I wouldn't be here writing blog posts ZING. For now we are innocently building a database of user data in order to be able to model the relationships between our users, to track when users like/match each other, who lives in the same city, etc. etc. We could certainly do this the old-fashioned way, by spinning up a traditional database like MySQL or Postgres and dumping the data into tables. How boring, how practical!

What we are going to do instead is utilize a fancy graph database I found on the internet called Dgraph. If you've got a technical background, you're probably familiar with the concept of a graph database. But as I mentioned in an earlier part of this tutorial, I am going to pretend that you are not.

Graph databases are cool because instead of modeling data as rows in a table, you model it as a graph, i.e. with nodes and edges. For instance, instead of a user being represented by a row in a users table, with fields called name, email, etc. we model an individual user as a node in the graph, and each of the edges corresponds to some data about the user. There's a "name" edge, an "email" edge, etc. This gets interesting when the edges start pointing to other nodes, e.g. an edge called likes that points to another user that the first user "liked" via the app.

As to why we would do this, I think the simplest explanation is that any dataset with a high number of relationships amongst the various data points is better represented in a format where we can trace those relationships effectively. Do you have a family tree modeled as a graph, and you want to know the names of all your second cousins? There are so many of them, it's hard to keep track, right? I mean, I have that list above my bed but even then I can barely distinguish between the ones on the "love" list and the ones on the "hate" list. What a conundrum, right?

Well if you modeled that data as a graph, you could easily trace your second cousins, as well as all your "friends" from middle school, then you could add an edge for hated_by_me that pointed from your user to them, and then next time you would just need to traverse the hated_by_me edges to easily keep track of all the friends and family members that you want to remember to hate.

You could do this with a traditional database, but you'd have to store that data in a set of tables that modeled the relationships between users, and then you'd need to write a query to self-join against the users table umpteen times to traverse the graph of that data. Definitely doable, but fairly inefficient.

So along comes Dgraph. Dgraph sells itself as a "native GraphQL database", which not only sounds cool, but is also a good fit for what we are doing now, seeing as we are building a GraphQL API. As such, the queries that we make to Dgraph will ultimately end up looking very similar to the ones we make to our API.

So anyway, let's get started with Dgraph. There are a whole bunch of ways to install the thing, but since we already have our local Kubernetes cluster up and running, we are going to take that approach and download the simplest Kubernetes config file they that they have available. Go ahead and download this file into our deployments/ directory:

$ wget https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml -O deployments/dgraph-single.yaml

Now add it to our Tiltfile:

k8s_yaml('deployments/api.yaml')
k8s_yaml('deployments/dgraph-single.yaml') # <-- Add this line

docker_build_with_restart('dating-app/api', '.',
...

k8s_resource('api', port_forwards=[3000])
k8s_resource('dgraph', port_forwards=[9080]) # <-- And this one

Now you should see a new entry for dgraph in the Tilt console, and if you're lucky, it should be spitting out a bunch of nonsense garbage that you don't need to understand. That's just Kubernetes setting up the dgraph cluster and getting the processes up and running. If you see errors instead, and the dgraph entry is all red and error-y, then you've got a problem. Please pick up the phone and dial emergency services and tell them "the database is broken". Say only those words, and make sure to repeat slowly if they ask for clarification. Just kidding, don't call 911 without a real emergency! And there's your PSA for the day.

Now that we've got Dgraph up and running, the next thing we need to do is set up our Dgraph database schema.

First we need to grab a couple of dependencies:

$ go get github.com/dgraph-io/dgo/v2
$ go get google.golang.org/grpc

dgo is the Dgraph Go API client, and grpc is a fancy tool originally created by Google that we use to communicate across service boundaries. Here's a nice introduction, if you're interested. If you're familiar with REST already, then here's a confusing explanation: RPC means "remote procedure call", and at one point REST was the new hotness that more or less replaced RPC, but now RPC has replaced REST in the sense that gRPC is gaining traction, so now the old thing is new again. Confused? Good! Let's continue.

Next, let's make a file called internal/dgraph/client.go and put the following in there:

package dgraph

import (
	"flag"
	"log"
	"os"

	"github.com/dgraph-io/dgo/v2"
	"github.com/dgraph-io/dgo/v2/protos/api"

	"google.golang.org/grpc"
)

var (
	dgraph = flag.String("d", os.Getenv("DGRAPH_HOST"), "Dgraph Alpha address")
)

// Client is our Dgraph API wrapper
type Client struct {
	conn *grpc.ClientConn
	*dgo.Dgraph
}

// NewClient generates an instance of our client
func NewClient() *Client {
	return &Client{}
}

// Connect to GRPC
func (c *Client) Connect() {
	flag.Parse()
	conn, err := grpc.Dial(*dgraph, grpc.WithInsecure())
	if err != nil {
		log.Fatal(err)
	}
	c.conn = conn
	c.Dgraph = dgo.NewDgraphClient(api.NewDgraphClient(c.conn))
}

// Close the connection
func (c *Client) Close() error {
	return c.conn.Close()
}

Now let's make a new executable called cmd/dgraph/setup.go:

package main

import (
	"context"
	"io/ioutil"
	"log"
	"os"
	"path/filepath"

	"github.com/dgraph-io/dgo/v2/protos/api"
	"github.com/kriskelly/dating-app-example/internal/dgraph"
)

func readSchema() []byte {
	wd, err := os.Getwd()
	if err != nil {
		log.Fatalln(err)
	}
	path := filepath.Join(wd, "./internal/dgraph/dgraph.graphqls")
	b, err := ioutil.ReadFile(path)
	if err != nil {
		log.Fatalln(err)
	}
	return b
}

func main() {
	log.Println("Setting up the Dgraph schema...")

	client := dgraph.NewClient()
	client.Connect()
	defer client.Close()
	op := &api.Operation{}
	op.Schema = string(readSchema())
	if err := client.Alter(context.Background(), op); err != nil {
		log.Fatal(err)
	}

	log.Println("Ran Alter Schema on DGraph")
}

And finally, let's make a schema file called internal/dgraph/dgraph.graphqls:

name: string @index(exact, fulltext) @count .
email: string @index(exact) .
password: password .
liked: [uid] @count @reverse .
rejected: [uid] @count @reverse .
matched: [uid] @count @reverse .
lives_in: uid @reverse .

type User {
    name
    email
    password
    matched
    liked
    rejected
    lives_in
}

type City {
    name
}

The code in client.go just wraps our dgo API client with some additional code to make and close a connection to the Dgraph server. Our dgraph.graphqls file defines the database schema that we'll be using, and we'll go over that in a bit. The setup.go glues these things together; it reads the schema file, opens a database connection, and runs an Alter operation to modify the schema to match our schema file, like so:

	op := &api.Operation{}
	op.Schema = string(readSchema())
	if err := client.Alter(context.Background(), op); err != nil {
		log.Fatal(err)
	}

Now we can dig into the schema file itself. The first thing this file does is define the data types for the different kinds of edges that we can have for our users:

name: string @index(exact, fulltext) @count .
email: string @index(exact) .
password: password .
liked: [uid] @count @reverse .
rejected: [uid] @count @reverse .
matched: [uid] @count @reverse .
lives_in: uid @reverse .

Some of this might seem self-explanatory. For instance, name: string tells us that the name field should be a string. For the other fields, [uid] indicates that the edge corresponds to an array of nodes, in this case other users. So in this way we can keep track of who liked/rejected who, where people live, etc.

The @index, @count, etc. parts correspond to database indexes for those fields. @count indexes the count, so we could for instance retrieve the number of users with a specific name. fulltext operates similar to a full-text index in Postgres or MySQL, and @reverse sets up an index that allows us to look up nodes in reverse. That means that instead of having to have a separate edge for liked_by, we could just look up a list of all the users who liked us by checking to see where there's a liked edge that points to us.

The rest of the schema file defines the various data types for our nodes. In this case we just have User and City:

type User {
    name
    email
    password
    matched
    liked
    rejected
    lives_in
}

type City {
    name
}

Types are basically defined similarly to GraphQL, which fits in with Dgraph's branding as a GraphQL database. I'm not entirely sure what happens if you try to create a user with an edge that is not defined on the type, but I'll do that at some point and report back.

Now that we've got our code set up properly to create the database schema, let's run it on the command line. Note that we are not container-izing this script just yet, as it's more of a one-off. For now we can just run it in the terminal:

$ DGRAPH_HOST=localhost:9080 go run cmd/dgraph/setup.go

Couple of things to note about this: we specify the Dgraph host via an environment variable (which we referenced in client.go earlier), and that environment variable points to a port we exposed on localhost earlier in our Tiltfile. Doing this sort of thing allows us to access any service that runs on our Kubernetes cluster as though we were running it outside the confines of the cluster, essentially so that we can poke at the database at our leisure to do things like setting up the schema, verifying the data, etc. We'll look into examining the data later on.

For now, if you run the script above, you should see a message indicating that the schema was altered. Congratulations to both of us! Congratulations to you for managing to follow my instructions, and also congratulations to me for not f'ing up the instructions. Wins all around.

Anyway, because everyone's attention span these days is about as long as my middle finger, I'm going to try to keep these posts short and sweet. We've successfully set up our Dgraph database and schema, so the next post is going to be all about querying Dgraph and the various ways of making it do our bidding. See you next week!