Thursday, June 7, 2018

Using the Crunchy Data PostgreSQL Operator - A Walkthrough

What is an Operator?


The question should really be "What is a Kubernetes Operator?" since the word "operator" appears in many contexts.

In the Kubernetes context, an Operator abstracts away the Kubernetes resource configurations for a specific application run on Kubernetes.  It deploys, manages and monitors those resources. There are operators for all sorts of things - logging, databases, even CI/CD integration. 
More specifically, the Operator creates a Kubernetes Custom Resource Definition (CRD) for specific technologies.


What does the Postgres Operator do?


Crunchy Data provides enterprise Postgres support to clients. The Postgres Operator is their community edition for Kubernetes Postgres applications.

The Postgres Operator deploys Postgres databases on your Kubernetes cluster. Databases have very specific requirements compared to other applications, persistence being chief among them. The Operator leverages Kubernetes Persistent Volumes for this purpose. It exposes a Service for communication inside the cluster, and it uses Secrets for database user authorization purposes. Oh, and it creates Pods that run your actual database.

Thanks to the Operator, we don't have to worry about making all of the cluster level resources work together - the operator does that for us.

To talk to the operator, the user gets a command line tool called pgo, and several scripts to deploy the Operator onto a working Kubernetes cluster.

The PostgreSQL Operator can:
  • Deploy Postgres clusters of the latest version with all the bells and whistles
  • tag metadata to the database clusters
  • control user access
  • backup databases, both from snapshot and by timestamp
  • Collect and store metrics
  • be customized further since it is an open source tool.



A note of disambiguation: The term "Postgres cluster" is a Postgres-specific term that refers to the particular way of how Postgres databases are set up for access and maintenance. It has absolutely nothing to do with "Kubernetes cluster". In my head I think of "Postgres cluster" as "Postgres database".


Installing and using the Operator


I wanted to learn about how to install and interact with Operators, so I explored this one's user documentation and ran through the demo installation. I created a walkthrough video with the help of a coworker. In the video, I deploy a Postgres Operator onto a Minikube Kubernetes cluster using the following steps:
  1. Set up a cluster
  2. Choose and set a namespace
  3. Set env vars for make
  4. Run make pull and make deployoperator
  5. Get pgo binary from source but realize binary names differ by OS

Note: The pgo binary download is a single download that comes with separate binaries for Mac, Linux, and Windows, as well as a number of configurations. In my demo, I alias "pgo-mac" to "pgo" to be consistent with the documentation.

To simulate how an app would interact with a database, I create a separate Pod, with an init-container that creates the database, and a postgres container to simulate the app that would access the database. I also run a Cron Job to create periodic backups of my database. The source code for that lives here. Here are the steps:


  1. The app accessing the database will run in a pod that has an init container that sets up the database with pgo
  2. centos for the init container, postgres for the database app
  3. Kubernetes Cron Job will backup every 5 minutes (for demo purposes)
  4. Exec into database app container (mypod) to interact with database (in database pod)
  5. Secrets have been set up as env vars so you can immediately access postgres without login

For restoring a database from a backup, I run pgo locally from my computer instead, and port-forward the Service IP address to my localhost. Here are the steps:

  1. Configure pgo client
  2. kubectl port-forward operator service and database service
  3. Create backup following these instructions
  4. Exec into mypod to access restored database and see restored data

Here is the video of the walkthrough, with comments from a coworker, who preferred to remain anonymous. Keep in mind that it's a live walkthrough, with little glitches and minor bumps; please feel free to click around. I hope to put some breaks into the video in the near future to make it more user friendly.





Wrapup and notes



  • Remember pgo has different names depending on OS
  • Postgres clusters are not Kubernetes clusters
  • Postgres cluster names need to be unique - if you try to deploy another cluster with the same name, nothing will happen 
  • Must have Mercurial installed on machine (I ran into this for one of the dependencies)
  • Although there is a separate script to create persistent volumes on the host file system, I did not use it, and relied instead on the default hostpath (for which no extra steps are needed). For actual production databases, the script is a handy reference for a more customized setup.


        Would I use this Operator?


        The short answer is yes.

        The documentation is fairly extensive. There is even a helm chart to deploy the operator. The database backup and restoring worked perfectly every time. There are a lot of supported configurations that look really useful, metrics and logging being among them. I started looking into this tool right as it was undergoing a major version upgrade, and I ran into some documentation hiccups and a few bugs (that may or may not have been version related). When I mentioned these concerns, the code maintainers were immediately on top of it, and assistance and fixes were underway. Of course, when it was possible for me to do so, I submitted some doc changes myself.

        The folks at Crunchy Data are friendly, responsive and easy to work with. I really appreciated their receptiveness to my doc changes and their willingness to explain and follow up on bug reports. Additionally, there are quite a few Crunchy Data Postgres Operator resources on youtube so I encourage folks to run a quick search and check those out.




        Saturday, May 5, 2018

        KubeCon Europe 2018 Copenhagen - Personal Retrospective

        A bit of background

        For those of you who don't know, I work on this really cool technology called Kubernetes. I can talk about it all day if you let me but that's not what this post is about. Kubernetes is an open source project that anyone can use and contribute to. Hit me up if you want to get started contributing and I'll show you the ropes.

        KubeCon is the big conference that happens several times a year around the world that is all about Kubernetes. It is huge, it is fairly oriented toward corporate sponsorship, and it is attended by mostly men, but that's a different post. I love KubeCon because I learn a lot every time I go and because of the awesome people I meet.

        To me, the best part about Kubernetes is its community. And I do love the technology aspect, I do.

        Detour.

        I am a bootcamp graduate. I believe that my particular bootcamp, Ada Developers Academy in Seattle, really set me up for success in a way not all bootcamps do. The program is 11 months, tuition-free with a ~10% acceptance rate, and 5 of those 11 months are spent in an industry internship. The program is for women and nonbinary folks. I was lucky that my internship place, Samsung SDS, offered me a full time position, and because I really enjoyed working on Kubernetes tooling and automation, I accepted and stayed.

        I got involved with the upstream community because one day I tried to submit a pull request and had a heck of a time figuring out how to do just that. I filed an issue, was directed to some folks involved with contributors' experience, they agreed their contributor guide wasn't the greatest, and offered me the opportunity to fix it.

        So I did.

        And I had so much help. And I met so many wonderful people. And I learned about the testing and pull request automation code base. And now, as a result, this KubeCon, I was asked to lead an entire workshop on how to contribute, together with Josh Berkus, who has been working on this for a lot more and in different areas. Dream team!

        Back to why my favorite part about Kubernetes is its community.

        As a female engineer who is a bootcamp graduate, in many ways I have two strikes against me. Many of my class mates struggle with getting recognition, advancement, and mentoring. Some of this struggle is universal. Some of these things happen to me also.
        But in the Kubernetes community, I feel welcome, I feel listened to, and my ideas are respected. I am making friends.

        What I learned for myself from this KubeCon

        So I'm at KubeCon because I helped new contributors. But I'm also there as a developer, a technical contributor myself. I want to be very clear on that, because, despite the community being very self-aware, I've already been asked to consider nontechnical career tracks. And the reason is that people see how passionate I am about growing the community and helping onboard people (which by the way totally takes tech skills). I do have several strengths that I bring with me from my previous career as a classical collaborative pianist, vocal coach, and piano teacher.

        One of my jobs was to teach young children piano, which I believe is one of the hardest skills known to humankind. And their parents expected that I should somehow get them to like it. I got really good at explaining complicated things in fun ways to hyperactive small people whose brains haven't fully developed yet.

        So here's the thing. Algorithms are hard. Systems design is hard. Those are things you need to study, grok, and gain experience in.

        APIs and tooling? Not hard. What makes them hard is that the documentation for them is terrible. If background, definition, and use cases were logically presented for everything involved? Most people could do it.

        I really believe that.

        My passion for onboarding people and finding mentoring, teaching, and documentation that helps people at all levels in tech is not only because I love teaching and community.

        It is that I am upset at how obtuse and unusable so much of this is. Many of the roadblocks to working in tech aren't because you don't understand a concept. They are because you have no idea how two things hook together, and no one will tell you in a reasonable way. And writing code is tricky enough - but if you don't know which API objects will give you the thing you want, you can code trees in circles around me and it will be no use. Then you will feel dumb and the myth that you have to be "talented" or "intelligent" to be able to write software persists. This has happened to me when learning new tooling so many times.

        I can teach illiterate six year olds to play Bach minuets on the piano. I fully believe that we can do a better job helping educated, passionate adults use our tools.

        I want to write code. And I want to take others with me.

        That is what I learned this KubeCon.




        Monday, April 30, 2018

        Fun with RBAC, or How I Hacked Into My Own Kubernetes Cluster

        This is an exploratory post, where I describe a problem I ran into and the workaround I found. I make no promises of my reasoning being correct. I'm just sharing in the hopes it might help someone, start a discussion, or maybe encourage someone to explain in a comment.

        Last week, I ran into a very fun problem.

        I was supposed to deploy a custom helm chart for an app on a new Kubernetes cluster.
        Basic steps:

        1. spin up a Kubernetes cluster on GKE
        2. deploy a chart to it from our chart repo on Quay
        3. expose the deployment

        For Step One, I followed the Quickstart Guide from Google Cloud and ran from my terminal

        gcloud config list

        to make sure I was in the correct project space. Then, I ran

        gcloud container clusters create guinscluster

        and waited for the spinner to stop turning.

        To check that I had my nodes up, I ran

        gcloud container clusters get-credentials guinscluster

        so that I could use kubectl to mess with my cluster. A quick run of

        kubectl get nodes

        confirmed that I had a working cluster with three nodes of Kubernetes Version 1.8.8-gke-0. Yay!



        For the next step, I knew I needed to refresh my knowledge of helm a bit. Again, I used their Quickstart guide for help. The chart repo I had to work with used a script to generate a Chart.yaml in CI, which would then be deployed to our app registry on Quay. I had to generate my own local Chart.yaml from a Chart.yaml.in file, which wound up being confusing.

        Anyhow, I went ahead and followed the guide and ran

        helm init

        What this does is it will deploy a thing called Tiller on a pod into your cluster. Tiller is what enables Helm to do its thing. To check whether I successfully started Tiller, I ran

        kubectl get pods --all-namespaces

        You must run this check with the --all-namespaces flag, because Tiller gets deployed into the kube-system namespace.

        Here's what I saw:


        Yay, tiller works!

        That was to be the end of my success for the day.

        I followed the guide and attempted to run

        helm install ./myawesomechart

        and got the following fun error:

        Error: release chart-technical-on-boarding failed: namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default": Unknown user "system:serviceaccount:kube-system:default"

        So that was cool. No matter what I did, I could not install that chart. My coworkers tried to help me out by suggesting I deploy it with a name, or into a namespace, or running helm package but no matter what I tried, I could not get this to work. Nothing in the helm user guide had prepared me for this.
        I decided clearly the problem was I didn't know enough about Helm and went to this tutorial to create my own helm chart from scratch. 

        helm install --name example mychart-0.1.0.tgz
        Error: release example failed: namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default": Unknown user "system:serviceaccount:kube-system:default"

        Same problem!

        So here is where I don't quite remember how I suspected what the problem was. I Googled around a bit for the error messages and I decided that maybe the problem wasn't with helm, but with my cluster and the access I had to it. RBAC is a fascinating thing that I have run into a couple times, so I went with my hunch.

        I found this article on Google Cloud (and by "found", I mean a coworker sent it to me - teamwork is great, everyone!) and it seemed to indicate that GKE clusters running Kubernetes version 1.6 or later already came with RBAC enabled. This did not mean what I thought it did. However, I ran

        gcloud container clusters update --no-enable-legacy-authorization guinscluster

        anyway, and followed through with the suggested

        gcloud components update

        No change to my ability to install a helm chart.

        Okay.

        So I looked around some more and I found this Github issue. Okay. So it seemed the problem was that for clusters with RBAC enabled one needed to create the correct permissions to actually use helm. I'm still puzzled why it's okay to install tiller, but not charts? Oh well! So I tried it:

        kubectl create clusterrolebinding --user system:serviceaccount:kube-system:default kube-system-cluster-admin --clusterrole cluster-admin

         which resulted in

        Error from server (Forbidden): clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "guin@fake-email.com" cannot create clusterrolebindings.rbac.authorization.k8s.io at the cluster scope: Required "container.clusterRoleBindings.create" permission.

        You know the feeling when you get a new, different error? It's this frustration mixed with guilt because getting a new error is supposed to be a good thing, it means you're finding out more? But also it could mean you're falling further behind since maybe you're taking the wrong track?

        Anyway. So my question is, if I bring up a shiny new GKE cluster, version 1.9, why on earth would I not have user permissions on it? I'm sure there's a good reason; I just don't know why, and I wasn't expecting it.

        StackOverflow to the rescue. If I could get the "admin" user's password for the cluster, I could pass user and password as flags for the clusterrolebinding and make it happen.

        gcloud container clusters describe guinscluster

        After that it was easy to copy the password from the result and pass it to the above command via flag like so:
        kubectl create clusterrolebinding --user system:serviceaccount:kube-system:default kube-system-cluster-admin --clusterrole cluster-admin --username=admin --password=<adminpw>

        After this, helm install worked like a charm, both on the tutorial example chart, and on guinsawesomechart. I have no idea why this song and dance was necessary, or how I should have spun up my cluster so that my identity had user permissions. This is something I hope to learn more about and bring it back here.

        I also, as always, welcome comments.

        Guin out. Time for KubeCon.
























        Saturday, March 24, 2018

        Golang for dummies - Interfaces

        If you just got here, it might be useful for you to follow the previous part of this post, where I talk about Structs and Methods in Golang.


        Interfaces in Golang


        An interface is a collection of methods. Any type that has all the methods that are declared in an interface is said to implement that interface.

        This implementation is implicit, i.e. no extra syntax is needed to declare a certain type as implementing an interface.
        Unfortunately, this means it's also confusing unless you know what to look for.
        Declaring an interface is fairly simple:

        Both types Witch and Unicorn implement the Healer interface, because both have a method called heal() that returns a string. It is in fact important that the signatures for each heal() method be exactly the same. If the Unicorn's version of heal() returned a bool, for instance, it would not be the same method that is stated in the Healer interface (which returns a string), and thus not implement the Healer interface.

        Additionally, we can use type Healer as an argument to the sleep() function. This allows us to write sleep() only once, and have it apply to all Healers (healers gotta get their beauty sleep too).


        Let's look at using the above types in main():


        You can see the different printed messages from the two separate heal() methods. The sleep() function works on all Healers. There's something interesting going on with bibi. We declared a Healer named bibi, which we can do, but in order for bibi to have name and heal (i.e. behave like a Witch), we need her to be of type Witch, or, in this case, &Witch. That is happening on line 34, where we are saying that Healer bibi is of type &Witch.
        Both corny and bibi have to be pointers, since their heal() methods accept pointers.

        Here's a bit of a closer look at what is going on underneath the Healer interface:


        The interface changes from having the value nil of type nil to having the value of a struct (that contains a field called Bibi), which is of type *main.Witch.


        Interfaces are useful for two reasons:



        1. They can be passed as a value to functions, meaning one function can now operate on all the types that implement the interface

        2. New types can implement existing interfaces from outside of their original packages simply by giving the new type all the required methods for that interface.

        To recap that second point:


        Let’s say Joe studies hard and becomes a doctor. We can create a new struct, called Doctor, which has a Method called heal(), which automatically makes a Doctor a Healer, of which Joe is an instance.
        Now, magically, without us writing any extra code, we can make Joe go to sleep, because the sleep() function accepts any Healer as an argument.

        The empty interface


        Don't worry. It's not actually that scary. It's just a little bit of a logic twist in thinking about interfaces.

        The empty Interface, or {}, is the interface that is implemented by every type.

        Every type has at least zero Methods, therefore every type satisfies the empty interface. Every single Golang type, built-in or newly coined struct, implements the empty interface by default.


        Why is this useful?


        The empty interface basically acts as a wrapper type for all types, so that a function can accept an unknown type.
        In some languages, such as Ruby, one can have slices or arrays of different types:

        [“hi”, 1, 4.5, [“planet”:”blue”]]


        Go handles this with the empty interface wrapper:



        In main(), we create a slice of empty interfaces. We then append a lot of values of different types to it. This works, because all these values satisfy the empty interface.
        We can then call PrintAll() on the interface slice.
        When we don’t know the type of an incoming value, (a log stream is another example), it is helpful for a function to accept the empty interface as an argument.

        Another famous example: in the fmt library, fmt.Print accepts any number of empty interface {} arguments.

        Remember the underlying type of an interface?

        This holds true for the empty interface, too.

        Even though each value in vals is of type {}, the underlying type is still what it started out as.

        Type assertion

        s can be re-cast as a string, because its underlying type is a string. It cannot be re-cast as a float64, because, well, "Unicorn" is a string.


        Thank you for joining me on my Golang journey. The more I discover, the more I like it as a programming language. It's versatile, fast, and relatively easy to read. Stay tuned as I discover my next topic, which will probably involve goroutines and concurrency.

        Tuesday, February 13, 2018

        Golang for dummies - Structs and Methods

        This is Part One of a two-part session on structs and interfaces. 


        Before we can talk about the scary thing that is interfaces in Go, we have to talk about structs. 

        Structs are pretty intuitive. They are a way to organize your code into cohesive ideas and then make it convenient to access individual properties of the struct, pass them, around, and generally make your life better. I did not experience structs as particularly difficult to understand. Thanks, Go!

        Structs are a composite type. This is as opposed to primitive data types such as int, bool, etc - of which Go has...a surprisingly large amount. Find them listed here.

        Structs are basically Golang's version of what in, for instance, Java, would be classes This is a bit oversimplified, and really just meant to get you an idea to get you started.

        Structs have fields. They can be accessed via the dot operator (.)


        We declare a struct according to the following format:

        type <StructName> struct {<fields>}




        On lines 9-12, we declare the Unicorn struct. It has three fields: age, name, and color. If two fields have the same type, we can put them on the same line, as is shown with name and color on line 11. Fields do not need to be primitive types, by the way; they can be other structs as well.

        On line 14, we declare cornyan instance of type Unicorn. We can see from the output of line 17 that the fields have default values of 0 and the emptry string. We later assign corny's fields values, and those are shown in the last line of the output.

        We say, "A Unicorn has a name". This has-a relationship is the definition of what should be a struct field.


        We can also create a new Unicorn with its fields at initialization. This is called a literal.



        If you use a literal to instantiate a struct, all the fields must match types and must be in the order that they appear in the struct declaration. The following things do not work:





        Only if you specify the field name in the declaration will it work regardless of order:



        Structs can also be nested. 
         
         
        Notice how we didn't have to call pegasus.MythicalBeast.Name. In fact, pegasus.Name was sufficient. Because a Unicorn is a MythicalBeast, it has an Age and a Name. Just like any other MythicalBeast.


        We cannot talk about structs without talking about Methods.

        Methods are functions that get called on a specific type, called a receiver. This sounds a lot like Java or Ruby classes. The difference is that we do not declare them as part of a struct declaration. Instead, we tell the method which type can "receive" the method. The receiver can be any data type (which is nice when you need to add functions to a type that’s not yours).
        Methods are called using the dot(.) operator, just like struct fields.

        The basic format for a method declaration is as follows:

        func (<ReceiverType>) <MethodName>(<args>) <returnType> {}


        You can see that on line 13 we have a heal() Method that takes a *Witch as a receiver. Then, on line 20, we can call heal()on bibi, a *Witch, with the dot operator.
        By the way, bibi is a *Witch, not a Witch, because we created her with the new() function, which returns a pointer to an object. It turns out that methods, unlike functions, do not care if you pass a pointer or a value:


        In general, you should have your methods accept pointers so that:
        a. the value does not get copied on each method call and

        b. we can potentially modify the receiver in the method.

        For more on pointers, please read my previous post on pointers.

        Bibi Blocksberg was a staple of my childhood. Learn more about her.




        Embedded Structs

        Remember nested structs, above? When structs are nested, we have access to the fields of the parent struct. With methods, the same holds true.



        bibi, a Witch, is also an Adventurer.
        bibi has additional skills that the average joe does not have.
        They both have the ability to walk(), but only bibi can heal(). bibi can use Adventurer methods because the Witch struct embeds the Adventurer struct, ("a Witch is an Adventurer) and as such has access to all the methods that Adventurer implements. The reverse does not hold true, because joe is not a Witch. He cannot use Witch methods. Methods are pretty awesome. Since they are not defined inside a struct, but rather with a receiver, you can add methods to existing types with very little trouble. I personally found the receiver concept a little tricky at first (why are there now three things in my function declaration?) but it helps me to think of the receiver as saying "This method operates on this struct". I hope this was helpful to some of you!

        Please stay tuned for the next installation of my golang journey, regarding Interfaces.
        Sources: 
            Golang playground
            Tour of Go
            Golang book chapter on structs