AWS DevOps from Teams to Production

Teams

DevOps mandates the use of Agile-Scrum and the destruction of silos.  An Agile-Scrum team will need to have the following roles from different groups in order to ensure that the application being built (or migrated, or re-platformed, or refactored, or extended); is compatible with the production environments and fit for end-user purpose:

  • Agile Scrum Master
  • Developers
  • Infrastructure (compute, networking)
  • Operations (with runbooks)
  • Security (entire stack including networking)
  • Testing
  • Business Owner with the money and mandate
  • Other SMEs depending on the project
  • Enterprise Architect or Solutions Architect as link to the greater organisation

Companies should not under-estimate the complexity of using Agile-Scrum on projects.  The team will need coaching, training, standard tools, processes, reporting, patience and support.  Scaling Agile-Scrum engineering practices across an enterprise is even more difficult.  Agile-Scrum and related DevOps processes are a long term investment and need to be supported by the C Suite. 

An example of a Continuous Integration (CI) and Delivery (CD) approach on AWS is given below.  A key issue is that Operations must have the knowledge, skills and runbooks to support the deployed application.  Usually firms will have a ‘hyper-care’ period after a new deployment in which the developer team works with Operations to resolve issues, provide knowledge transfer (KT) and support any code rewriting which maybe necessary.  This entails a new approach to organizational management, usually a Change Management process and leadership, and workshops to break-down the historical mistrust between Dev and Ops.

A screenshot of a cell phone

Description automatically generated

Application Development Team:

Using automated tools, standard processes, code and artifact repositories, a proper coding trunk and branch strategy; integrated testing including unit testing; build, package and deploy the application.  Again, standards are key.  Pick some tools and use them.  The team may choose to use services in AWS such as AWS CodeStar, AWS CodePipeline, AWS CodeCommit, AWS CodeDeploy, and AWS CodeBuild, along with Jenkins, GitHub, Artifactory, TeamCenter, and other similar tools.  The important factor is to map out the Dev-Ops workflow, including the underlying infrastructure, and architect an end to end process using standard tools.

Infrastructure Team:

There are 2 key members.  The first is networking.  VPCs, subnets and environments must be setup and networked (and secured).  The second is the underlying compute infrastructure.  Set up automate templates with the Development team, to deploy the packaged code into the Dev-Test- and eventually after sign-offs-the Production environments.  We want all the environments to be the same, including the underlying node and server infrastructure.  Standard tools need to be used which are platform specific.  Examples on AWS include Cloud Formation, Chef, Puppet, Terraform or Ansible.  The key is to make reusable templats and to view the deployment process as part of the development process, leading to IaC or Infrastructure as Code (templates), which support the application stack requirements automatically.  Importantly the company should standardize on one CI and one CD engine.

Security:

A security SME with relevant AWS (or target platform) experience and certifications, must be on the team securing all environments including the network, data, application, dev code and executable repos, databases and all artifacts being deployed per environment.  This person needs to enforce company standards and will usually come armed with checklists that need to be satisfied in both design and deployment.

Testing:

The entire pipeline needs to be tested using both automation and human intervention (sanity checks included).  Testing should start as early as possible and, as always, use standard tools, processes, entry and exit criteria.  There needs to be a test manager involved to ensure that the standards are implemented.  The pyramid below illustrates the depth of the testing which is necessary (and associated costs).

Unit tests are on the bottom of the pyramid. They are both the fastest to run and the least expensive. Therefore, unit tests should make up the bulk of your testing strategy. A good rule of thumb is about 70 percent. Unit tests should have near-complete code coverage because bugs caught in this phase can be fixed quickly and cheaply.

Service, component, and integration tests are above unit tests on the pyramid. These tests require detailed environments and, therefore, are more costly in infrastructure requirements and slower to run. Performance and compliance tests are the next level. They require production-quality environments and are more expensive yet. UI and user acceptance tests are at the top of the pyramid and require production-quality environments as well.

All of these tests are part of a complete strategy to assure high-quality software. However, for speed of development, emphasis is on the number of tests and the coverage in the bottom half of the pyramid.

Deploying the Code

Setting up the Source

At the beginning of the project it’s essential to set up a source where you can store your raw code and configuration and schema changes. In the source stage, choose a source code repository such as one hosted in SVN on EC2, GitHub or AWS CodeCommit.

Setting Up and Executing Builds

Build automation is essential to the CI process. When setting up build automation, the first task is to choose the right build tool. There are many build tools, such as Ant, Maven, and Gradle for Java; Make for C/C++; Grunt for JavaScript; and Rake for Ruby. The build tool that will work best for you will depend on the programming language of your project and the skill set of your team. After you choose the build tool, all the dependencies need to be clearly defined in the build scripts, along with the build steps. It’s also a best practice to version the final build artifacts, which makes it easier to deploy and to keep track of issues.

In the build stage, the build tools will take as input any change to the source code repository, build the software, and run the following types of tests:

Unit Testing – Tests a specific section of code to ensure the code does what it is expected to do. The unit testing is performed by software developers during the development phase. At this stage, a static code analysis, data flow analysis, code coverage, and other software verification processes can be applied.

Static Code Analysis – This test is performed without actually executing the application after the build and unit testing. This analysis can help to find coding errors and security holes, and it also can ensure conformance to coding guidelines.

Staging

In the staging phase, full environments are created that mirror the eventual production environment. The following tests are performed:

Integration Testing – Verifies the interfaces between components against software design. Integration testing is an iterative process and facilitates building robust interfaces and system integrity.

Component Testing – Tests message passing between various components and their outcomes. A key goal of this testing could be idempotency in component testing. Tests can include extremely large data volumes, or edge situations and abnormal inputs.

System Testing – Tests the system end-to-end and verifies if the software satisfies the business requirement. This might include testing the UI, API, backend logic, and end state.

Performance Testing – Determines the responsiveness and stability of a system as it performs under a particular workload. Performance testing also is used to investigate, measure, validate, or verify other quality attributes of the system, such as scalability, reliability, and resource usage. Types of performance tests might include load tests, stress tests, and spike tests. Performance tests are used for benchmarking against predefined criteria.

Compliance Testing – Checks whether the code change complies with the requirements of a nonfunctional specification and/or regulations. It determines if you are implementing and meeting the defined standards.

User Acceptance Testing – Validates the end-to-end business flow. This testing is executed by an end user in a staging environment and confirms whether the system meets the requirements of the requirement specification. Typically, customers employ alpha and beta testing methodologies at this stage.

Production

Finally, after passing the previous tests, the staging phase is repeated in a production environment. In this phase, a final Canary test can be completed by deploying the new code only on a small subset of servers or even one server, or one Region before deploying code to the entire production environment.

==END