Building a Reusable Little Service, Part I

Django/Celery on Docker, CodePipeline, Elastic Beanstalk, with an RDS database.


Motivation


Minerva's Active Learning Forum From Active Learning Forum: A New Way to Learn (YouTube)

Minerva’s Active Learning Forum (ALF) started as just an online dashboard and virtual classroom. As Minerva grew, so did the requirements for this platform, storing more types of data and providing more features and views. Originally a full-stack web application with Backbone and Marionette.js on a Django back-end, the monolithic codebase (internally named picasso) now hosts over a dozen Django apps, scores of models, a full suite of tests that require half an hour or more to run, and nearly a hundred shared dependencies for the back-end alone.

To improve development, testing, and deployment speed, as well as reduce cognitive overhead, the ALF team decided to move toward a service-oriented architecture (SOA) that interacted through clearly defined, ideally RESTful, APIs. We have already seen great success with running Licode, our A/V system, as a service, and more recently with refactoring “ClassGrader”, where faculty grade student-submitted assignments, as a React webapp.

Requirement definition

The first step we made towards an SOA was a brainstorm session with three ALF engineers. First, we listed ALF’s dependencies that were already considered standalone services. One is our Pubsub server, which allowed real-time messages and events to be sent between users in the same classroom. Another example is Looker, a third-party application that we connect directly to our database.

Then, we started breaking down feature seams of ALF that could (should) be similarly independent. In the front-end, for example, we have a set of dashboards that only faculty can view, displaying grades and other data for students in their sections, pages for section administration and changing enrollments, and pages for modifying individual classes. Many models and patterns are certainly shared conveniently between them, but those models (and assets and dependencies) become intertwined more tightly and unpredictably as we build more views that require different combinations of them. In the back-end, we might want monitoring and stats collection and aggregation functionality to be independently executed, as a sudden influx of events could (and have) degraded response times for general web requests on the same shared server. Emails and interfaces for third-party integrations like Google Docs are also suitable for abstraction, as we have various implementations throughout the codebase.

With this in mind, we wanted to experiment with building a templated base infrastructure for ALF services. For such a template to be useful, it necessarily has to share many of the technologies already used by ALF in picasso, and include features for a developer to easily and confidently start a production-ready service. We decided that, for an MVP, the stack would include the following:

  • Django 1.11 / Python 2.7
    • Strong patterns for API endpoints, both reading and writing, as well as permissions, will be needed from a higher-level perspective. It’s unclear whether DRF, Django Rest Framework, should be standard.
  • Monitoring and alerts
    • Currently, only the built-in AWS monitoring is implemented.
  • Automated building, testing, and deployment
    • A CI/CD pattern seemed the most robust, particularly with the deployment abstraction EBS provides, and the automatic hooks from CodePipeline.
  • Simple local development environment setup
    • To meet this need, as well as uniform local/testing/deployment environments, we decided to use containerization with Docker to deterministically create images and isolate dependencies.

Background learning

To gain a foundational understanding of a service-oriented architecture, the author (Cheng, the primary engineer on the project), spent some time researching and learning high-level concepts. Two books, Building Microservices by Sam Newman, and Production-Ready Microservices by Susan J. Fowler, were invaluable. The usual online resources and documentation for Docker, Circle, and AWS were accessed when they were needed.

Notes on implementation details

The files and folders referenced below are placed in a single repository, which can then be cloned and used as the infrastructure for any application-level code that we’d like to develop and deploy. We’re hoping to be able to open source the work we’ve done thus far, and link to it here if we are indeed able to. For now, you’ll have to live vicariously!

Local layer

Docker

Developers install native Docker for their operating system, which runs as a service in the background. docker-compose.yml specifies the containers needed for local development, and currently only one Python container is needed. This container is described by Dockerfile, which specifies dependency installation and caches locally for each additional layer of dependencies. On start, docker-entrypoint.sh is used to activate Python’s venv, run any Django migrations, start a Celery worker in the background for asynchronous tasks (a native Redis is used for this), and finally start the web server worker.

Django

By convention from picasso, the Django installation is created in a folder named server. A static folder is created for static assets. settings.py within config contains the master configuration, and the built-in Django support for environment setting modules is used to set database configuration for Circle and local environments. URLs are indicated in urls.py as well.

The application code itself is in another folder, called little_service_app. This application is registered with the Django installation, and its structure follows Django conventions. apps.py is used to store some application-level configuration, celery.py currently only supports the local development environment’s Redis, and views.py contains controller code. The templates folder contains view-level code, and migrations has migrations. No default front-end has been added yet, so built-in Django templating is used.

Test layer

Circle 2.0 makes use of Docker containers, so one Python and one MySQL container is specified in .circleci/config.yml. Both containers cache dependencies based on changes to requirements.in for Django, and save them to venv. Then migrations are run on a fresh database, followed by unit tests. All of the dependency setup and entrypoint scripting is done within the Circle configuration file, which needs to be kept parallel with the local and production Dockerfiles.

Deployment layer

The build process uses Amazon CodeBuild, which takes the specs in buildspec.yml and builds a single web server Docker image and uploads it to Amazon ElasticContainerRegistry. Then, CodePipeline takes that build and deploys it with ElasticBeanstalk.

Future aspirations

Finally, the overall cloning of the service could be automated significantly. We’re starting work on a script that will copy and rename files from the repository, as well as replacing appropriate strings. We’re also investigating a script to automate AWS setup.

Thanks for joining us on our journey to build a collaborative, all-in-one solution for building and deploying a reusable little service. We’re actively developing this project, so we hope to have more to share soon. Until then!