Continuous Testing
A comprehensive guide to developing a continuous testing plan, implementation, and automation approach for your project using AI tools.
Introductionβ
Backgroundβ
Continuous testing (CT) is the practice of automatically and continuously testing code changes throughout the development process to identify and address issues early. The scope of CT includes testing code modules, interaction between software components, security, and user interfaces. Implementing CT is often difficult and time-consuming - we, therefore, recommend a simplified approach to get started with CT through a template and by using artificial intelligence tools like large-language models to make a test plan a reality quickly. The goal is to enable your project to identify and fix problems early, before they become major issues, leading to faster releases, improved software quality, and happier users.
Approachβ
Our recommendation is to follow a 4-step plan for continuous testing: (1) document your test approach, (2) write your tests, (3) automate your tests, and (4) maintain your tests.
The below diagram illustrates the 4-step approach.
Use Casesβ
- Using artificial intelligence tools like large-language models (LLMs) to speed up test writing
- Establishing a continuous testing plan that coordinates your testing goals
- Automating as many of your tests as possible to run automatically
- Providing frequent test results and feedback for development teams
- Facilitating quicker releases
- Fostering a culture of continuous improvement in software development
Quick Startβ
Although there are many steps in setting up a continuous testing architecture, the most important step is to document a shared plan of approach for your team.
- If you already have continuous testing enabled for your project: we recommend documenting your plan for internal and external team members' benefit within a
TESTING.md
. This will help you communicate your testing architecture so that team members can contribute easily to improve code through additional tests and automation. Copy/paste our below template and fill it in with your testing setup. Further steps are detailed in the (#step-by-step) guide section, which we encourage you to follow to make test writing and automation more efficient. - If you are brand new to continuous testing: we recommend starting with our (#step-by-step) guide to set up an efficient continuous testing plan for your project. Key Concepts to Get Started with:
β¬οΈ Download and setup a TESTING.md test plan (see example of template in action)
β¬οΈ Download our .pre-commit-config.yaml for a standard code styling, formatting and linting checker
π Generate unit tests automatically using a Large Language Model (LLM), such as codellama via ollama (see example of unit test script generation)
π€ Generate system tests automatically using an LLM, such as codellama via ollama and Robot Framework (see example of system test script generation)
π Tie continuous testing scripts all together with a continuous integration pipeline
Step-by-Step Guideβ
This step-by-step guide will help you establish, write, automate, and maintain a continuous testing solution for your project.
1. Create a TESTING.mdβ
We recommend creating a TESTING.md
file that outlines the testing objectives and plans for your software in an easy-to-find location. This file will provide your development team and contributors with:
- A list of the types of tests you run against your software.
- Locations where your tests are defined.
- When and how your tests are run.
- How to contribute/modify tests.
Having this information in a single file helps guide your testing journey and adds clarity for your team.
β¬οΈ Download our TESTING.md Template (see example of template in action)
1.1 Testing Categoriesβ
Scroll to the "Testing Categories" section within your TESTING.md
file and check off the testing categories relevant to your project from the provided list. You can also add your own if needed. Here are some common testing categories:
- Static Code Analysis: Checks code for syntax, style, vulnerabilities, and bugs.
- Unit Tests: Tests functions or components to verify that they perform as intended.
- Security Tests: Identifies potential security vulnerabilities.
- Build Tests: Checks if the code builds into binaries or packages successfully.
- Acceptance Tests: Validates against end-user and stakeholder requirements.
Once you've identified the testing categories important to your project, write a description for each using the provided template section. Be sure to include:
- The purpose of that testing category.
- The location where the tests are stored.
- How the tests are triggered.
- The framework used for testing.
- Best practices or guidelines for contributing to that testing category.
1.2 Example: Unit Testingβ
In the "Unit Testing" section, explain how unit tests are structured, where they're stored, and how frequently they run. Include the testing framework used and best practices for writing unit tests. For example:
- Location:
/tests/unit
- Purpose: Verify that individual functions or components work as intended.
- Running Tests:
- Manually:
- Navigate to the
/tests/unit
directory. - Run the relevant unit testing command (e.g.,
npm test
for JavaScript). - Review results in the terminal or generated reports.
- Navigate to the
- Automatically:
- Frequency: Upon every commit or pull request.
- Results Location: GitHub Actions Unit Tests Workflow
- Manually:
- Framework Used: Jest for JavaScript.
- Tips: Focus on core functions and methods, use mocks for dependencies, and handle edge cases.
1.3 Example: Performance Testingβ
In the "Performance Testing" section, detail how you ensure your application can handle expected and peak loads. For example:
- Location:
/tests/performance
- Purpose: Validate that the application meets performance goals under load.
- Running Tests:
- Manually:
- Navigate to the
/tests/performance
directory. - Run the performance testing tool command (e.g.,
jmeter -n -t test.jmx
). - Review generated reports for analysis.
- Navigate to the
- Automatically:
- Frequency: Monthly stress tests or before release candidates.
- Results Location: deployment test server's `/var/log/myapp/performance-tests.log
- Manually:
- Framework Used: Apache JMeter.
- Tips: Simulate peak load scenarios, monitor resource usage, and use tools like Chaos Monkey for resilience testing.
2. Write Your Testsβ
We recommend using Testing Frameworks when writing your tests to automate, organize, and analyze your testing efforts effectively. You can write your test code from scratch, but another way to write it is using large language models (LLMs), and specifically our recommended Test Code Generation Frameworks.
Recent studies show that large language models (LLMs) can generate test code covering up to 85% of scenarios (source: study). Open-source LLM tools like codellama can create initial test code, which developers can refine. Running these models locally addresses data privacy concerns. For the latest advancements, refer to the code model ranking.
Here's our recommended approach to deciding the right model for you to use (see our full list of recommended code generation models):
Is your code open source and permissively licensed?
- Yes: We recommend high-performance cloud-based LLM models (see our recommended cloud-based models)
- No: We recommend using locally-run LLM models (e.g., codellama) for sensitive data, such as those provided by the Ollama tool, to protect your code from dissemination. (β οΈ These models may not perform as well as cloud-based bigger models, such as
GPT-4
. We recommend asking the LLM to generate a simple test template for you to fill out.)
2.1 Example: Writing Unit Tests using Downloadable, Open Source Code Generation Modelsβ
For unit tests, please follow the steps below:
Download and Install OLLAMA:
- OLLAMA: A streamlined tool for running various LLMs, like
llama3
andcodellama
, locally. Follow the steps to install this tool locally.
- OLLAMA: A streamlined tool for running various LLMs, like
Invoke LLM and Generate Test Code:
- Example script:
# calculator.py
class Calculator:
def add(self, a, b):
return a + b
def subtract(self, a, b):
return a - b- Example LLM code generation command:
ollama run codellama "$(cat calculator.py) from the above code, write a unit test for the functions add and subtract. do not explain the code. only provide the unit test script. Add inline comments in your tests to clarify the purpose of each test. These comments should include details on the function being tested, the test type (e.g., bug fix, change request, requirements validation, anomaly reports), and any relevant context."
The given bash script above utilizes the ollama command to execute the codellama tool, passing it the content of the file dswx_s1_validator.py as an argument within double quotes. This content is retrieved using the cat command. The purpose of this script is to run the codellama tool on the code provided in dswx_s1_validator.py and generate a unit test specifically for the get_burst_id function within that code.
We recommend adding inline comments in your tests to clarify the purpose, including details on the function being tested, the test type (e.g., bug fix, change request), and relevant context.
Output looks like this:
import unittest
from calculator import add, subtract
class TestCalculator(unittest.TestCase):
def test_add(self):
"""
Test the add function to ensure it correctly adds two numbers.
Test Type: Requirements Validation
"""
self.assertEqual(add(2, 3), 5)
def test_subtract(self):
"""
Test the subtract function to ensure it correctly subtracts two
numbers.
Test Type: Requirements Validation
"""
self.assertEqual(subtract(5, 2), 3)- Codellama is one example of a model available in the library. You can explore other models at Ollama's library.
Review and Refine Generated Code:
- Developers should review the generated code, fix errors, and add any missing edge cases.
- Iterate as Needed:
- If necessary, update the prompt and obtain a revised test code. Repeat the process until satisfactory.
Disclaimer: While LLMs can generate approximately 80% of test code automatically, developers must verify and refine the remaining 20% to ensure comprehensive test coverage.
The following are suggested LLM prompts to use with automated generation.
Example Prompts for Auto-generated Unit Tests:
- Basic Functionality Testing: "Generate unit tests for a function/method that performs basic arithmetic operations (addition, subtraction, multiplication, division)."
- Handling Edge Cases: "Create tests for a function that handles edge cases, such as zero division, boundary values, and unexpected input types."
- String Manipulation: "Generate unit tests for a function that involves string manipulation, including tests for string concatenation, slicing, and length calculations."
- List/Array Operations: "Create tests for functions that operate on lists/arrays, covering scenarios like element addition, removal, and list comprehensions."
- Exception Handling: "Generate unit tests to ensure proper exception handling in functions that may encounter errors. Include tests for both expected and unexpected exceptions."
- Example Generated Unit Test:
% ollama run codellama "Generate unit tests for a function/method that performs basic arithmetic operations (addition, subtraction, multiplication, division)"
import unittest
class TestArithmeticOperations(unittest.TestCase):
def test_addition(self):
result = add(2, 3)
self.assertEqual(result, 5)
def test_subtraction(self):
result = subtract(5, 3)
self.assertEqual(result, 2)
def test_multiplication(self):
result = multiply(4, 6)
self.assertEqual(result, 24)
def test_division(self):
result = divide(10, 5)
self.assertEqual(result, 2)
if __name__ == "__main__":
unittest.main()
2.2 Example: Writing System Tests using Downloadable, Open Source Code Generation Models and Behavior-Driven Development Frameworksβ
System-level tests that involve interaction with external software or files can be complicated to test. Please refer to our System Testing section within our Testing Frameworks Guide for recommended system testing tools that leverage behavior-driven development (BDD) philosophies. One such example, Robot Framework, is demoed below in collaboration with Large-Language Models. Specifically, using LLMs to auto-generate Robot Framework pseudocode can help streamline the process of creating integration test cases. Complicated system testing scenarios can be simplified and written automatically using LLMs. Here's an example:
Generating Robot Framework Pseudocode with LLM:
- Use Codellama to generate test case pseudocode in Robot Framework syntax.
% ollama run codellama "Generate a Robot Framework script to perform MFA (Multi-Factor Authentication) login. The script should navigate to the login page, fill in the username and password fields, generate a TOTP code using the provided secret, enter the TOTP code, click the 'Sign in' button, and verify that the login was successful by checking the welcome message."
*** Settings ***
Documentation Example test suite
Library SeleniumLibrary
*** Test Cases ***
Valid Login
Open Browser https://dummy-website.com chrome
Input Text username_field valid_username
Input Text password_field valid_password
Click Button login_button
Page Should Contain Welcome, User!
Invalid Login
Open Browser https://dummy-website.com chrome
Input Text username_field invalid_username
Input Text password_field invalid_password
Click Button login_button
Page Should Contain Invalid credentials
- Use Codellama to generate test case pseudocode in Robot Framework syntax.
Direct Revision and Enhancement:
- Revise the Robot Framework pseudocode as needed:
- Add additional steps.
- Include assertions for edge cases.
- Incorporate custom keywords or libraries.
- Revise the Robot Framework pseudocode as needed:
Test Execution:
- Run the tests locally or integrate them into your CI pipeline.
By combining LLM's natural language capabilities with Robot Framework's structured format, you can efficiently create and adapt test cases.
Here is an example to generate a Robot Framework script for MFA (Multi-Factor Authentication) login with the following steps:
- Navigate to the login page.
- Fill in the username and password.
- Generate a TOTP (Time-Based One-Time Password).
- Fill in the TOTP code field.
- Click the "Sign in" button.
- Verify that the login was successful.
Here's a simple prompt to generate the script:
"Generate a Robot Framework script to perform MFA (Multi-Factor Authentication) login. The script should navigate to the login page, fill in the username and password fields, generate a TOTP code using the provided secret, enter the TOTP code, click the 'Sign in' button, and verify that the login was successful by checking the welcome message."
Here's the LLM output Robot Framework script:
*** Settings ***
Library Browser
Library totp.py
Suite Setup New Browser browser=${BROWSER} headless=${HEADLESS}
Test Setup New Context
Test Teardown Close Context
Suite Teardown Close Browser
*** Variables ***
${BROWSER} chromium
${HEADLESS} False
*** Test Cases ***
MFA Login Test
Open Browser https://example.com/login ${BROWSER}
Input Text id=username your_username
Input Text id=password your_password
${totp} Generate TOTP your_totp_secret
Input Text id=totpcode ${totp}
Click Button id=signin_button
Wait Until Page Contains Element xpath=//h1[text()='Welcome!']
Here's the test output:
% robot --pythonpath . tests
==============================================================================
Tests
==============================================================================
Tests.Mfa Login
==============================================================================
Login with MFA | PASS |
------------------------------------------------------------------------------
Tests.Mfa Login | PASS |
1 test, 1 passed, 0 failed
==============================================================================
Tests | PASS |
1 test, 1 passed, 0 failed
==============================================================================
Output: MFA_Login/output.xml
Log: MFA_Login/log.html
Report: MFA_Login/report.html
The quality of the generated test code can be compared with the example provided by Robot Framework here. The website name that you are testing needs to be updated, but other than that, it was perfect!
3. Automate Your Testsβ
Our recommendation is to automate as many of your tests as possible using pre-commit, a framework that manages and maintains multi-language pre-commit hooks that can be used on the client side as well as the server (VCS) side.
3.1 Static Test Automationβ
We recommend setting up a static test using .pre-commit-config.yaml
.
3.2 Component Test Automationβ
Component tests refer to tests for your immediate code base, code file, or something that does not require system-level interaction. Please consult our Testing Frameworks guide for a choice of testing tools we recommend. Once selected, we recommend automating the execution of your tests in both of the following ways:
- Execute tests locally on your developers' machines upon local Git commits
- Execute tests upon Git pushes to given Git branches on your version control system (VCS) - hosted on GitHub.com or alternate
This idea is represented in the following diagram:
we recommend using pre-commit. Here's how to set it up:
Developers' Machinesβ
Step 1: Install
pre-commit
on your local machine. If you are using Python, you can install it via pip:pip install pre-commit
Step 2: Create a .pre-commit-config.yaml file at the root of your repository with the configuration for your Python component tests using PyTest. Here's an example template you can start with:
Python
repos:
- repo: local
hooks:
- id: pytest
name: PyTest
entry: pytest
language: system
files: '\.py$'
stages: [commit]This configuration assumes that you have PyTest installed and set up for your project. The files regex \ .py$ ensures that the pre-commit hook only runs on Python files.
HCL (HashiCorp Configuration Language)
repos:
- repo: local
hooks:
- id: terraform_fmt
name: Terraform Format
entry: terraform fmt -check
language: system
files: '\.tf$'
stages: [commit]This configuration uses Terraform's built-in fmt command to format Terraform configuration files. While not a direct component test, it's a common practice to ensure code quality and consistency in HCL-based projects.
JavaScript
repos:
- repo: local
hooks:
- id: jest
name: Jest
entry: npm run test
language: system
files: '\.(js|jsx)$'
stages: [commit]This setup assumes you are using Jest for testing your JavaScript projects. The npm run test command should be configured in your package.json to execute Jest tests. If using TypeScript, replace the line
files: '\.(js|jsx)$'
withfiles: '\.(ts|tsx)$'
.Jupyter Notebook
repos:
- repo: local
hooks:
- id: nbtest
name: Notebook Test
entry: jupyter nbconvert --to notebook --execute --inplace
language: system
files: '\.ipynb$'
stages: [commit]This configuration uses Jupyter's nbconvert tool to execute notebooks as a form of testing. It's a basic approach to running tests in Jupyter Notebooks and might need additional tooling or scripts for more comprehensive testing scenarios.
Step 3: Install the pre-commit hook into your Git repository:
pre-commit install
Now, every time you commit changes, your component tests will run automatically on the specified (pattern-matching) files you've staged for commit.
Version Control Systemβ
For automated execution of component tests upon Git pushes using a VCS, we recommend using GitHub Actions or a configuration for Jenkins:
To invoke a .pre-commit-config.yml
configuration from GitHub Actions or Jenkins for automated execution of unit tests upon Git pushes, follow these detailed directions:
GitHub Actionsβ
To execute the pre-commit hooks defined in your .pre-commit-config.yml
as part of a GitHub Actions workflow, you will create a workflow file in your repository that triggers push events. Hereβs how to set it up:
Create a Workflow File: Navigate to the
.github/workflows
directory in your repository. If it doesn't exist, create it.Define the Workflow: Create a new file named
pre-commit-action.yml
(or another name of your choosing) in the workflows directory. Add the following content to this file:name: Pre-commit Hooks
on: [push]
jobs:
run-hooks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.8'
- name: Install pre-commit
run: pip install pre-commit
- name: Run pre-commit hooks
run: pre-commit run --all-files
This workflow checks out the code, sets up Python, installs pre-commit, and then runs all the pre-commit hooks defined in .pre-commit-config.yml against all files in the repository. Adjust the python-version and setup steps according to your project's needs.
Commit and Push: Commit the workflow file to your repository and push it to GitHub. The workflow will automatically trigger on the next push to your repository.
Jenkinsβ
To run the pre-commit hooks as part of a Jenkins build, you'll need to configure a Jenkins job that checks out your repository and executes the pre-commit hooks. Here's how to do it:
Install Pre-commit on Jenkins: Ensure that pre-commit and any language-specific runtime (like Python, Node.js) are installed on your Jenkins server or within the build environment that will run your job.
Create a New Jenkins Job: In Jenkins, create a new job by selecting "New Item," then choose "Freestyle project," and give it a name.
Configure Source Code Management: Under the "Source Code Management" tab, select "Git" and fill in the repository URL and credentials if necessary.
Add Build Step to Execute Shell: In the "Build" section, add a build step that executes shell commands. Add the following commands:
#!/bin/bash
# Install pre-commit if not already installed; optional based on your setup
pip install pre-commit
# Run pre-commit hooks
pre-commit run --all-filesSave and Run the Job: After configuring the job, save it and run it manually to verify that the pre-commit hooks are executed as expected.
Triggering the Job: You can configure the job to be triggered on each push to your repository by using Jenkins webhooks or polling SCM, depending on your preference and setup.
3.3 System Test Automationβ
System tests refer to tests that require interaction between multiple components. Not every project has this complexity. To aid in the automation of system tests, we suggest doing the following:
- Have a schedule for running system tests (e.g., nightly, weekly)
- Ensure software is built and published to repositories:
- Stand-alone components of your software should be independently released, built, or packaged to be published on public repositories
- (Optional) a final, single build of an integrated software consisting of multiple components is built, packaged, and published to a public repository
- Pull built or packaged artifacts from repositories and deploy software release (components or single package) to a virtualized environment
- Test the deployed release against a set of specified system tests
The diagram below illustrates this concept (Continuous Testing parts are highlighted in blue):
Types of System Tests
You should outline the types of system tests you plan to implement in your TESTING.md
file. We suggest the following types of tests to include:
- Testing for integration
- Interaction between software components
- Interaction with external services, provided files, exchange of messages, etc.
- Testing for security
- See existing SLIM security best practices as part of your software development workflow.
- Testing for performance and load
- Testing user interfaces for gaps and compliance against policies
Example: Integration Test Automationβ
We recommend the following steps:
- For more information about applying integration testing, take a look at the following external guide from Microsoft Engineering Fundamentals.
- Follow the process of above diagram for integration test automation.
- Integrate your integration tests into a Continuous Integration (CI) pipeline, which will allow for the automatic execution of tests upon code changes.
Example: Security Test Automationβ
To aid in security testing automation, we recommend two steps:
- Add security testing to your developers' local coding environment via pre-commit (see Unit Test Automation section above)
- Enable existing SLIM security best practices as part of your software development workflow.
Example: Performance Test Automationβ
We recommend the following steps for performance test automation:
- Take a look at the following external guide from Microsoft Engineering Fundamentals for more information about applying performance testing.
- Refer to the Testing Frameworks page for performance test tools and frameworks.
- Integrate performance tests into your Continuous Integration and Continuous Delivery (CI/CD) pipeline to enable regular and automated execution of performance tests as part of the software delivery process.
Example: User Interface Test Automationβ
We recommend the following steps for user interface test automation:
- Take a look at the following external guide from Microsoft Engineering Fundamentals for more information about applying user interface testing.
- Refer to the Testing Frameworks page for user interface test tools and frameworks.
- Set up a CI pipeline to automatically trigger UI test execution upon code changes or at scheduled intervals.
4. Maintain Your Testsβ
Your tests should be updated, at minimum, upon the following events:
- Whenever code changes occur (e.g., new features, bug fixes, refactoring), revisit related tests.
- Upgrading libraries, frameworks, or testing tools may necessitate adjustments to existing tests.
- As your application data evolves, ensure test data remains relevant.
- Periodic code coverage analysis identifies underused or obsolete tests.
This is the list of items to be maintained:
Frequently Asked Questions (FAQ)β
Q: How does continuous testing address usability and user interface testing?
A: We acknowledge the importance of usability and UI testing and are actively exploring ways to integrate them seamlessly into our continuous testing model.
Q: Is it necessary to implement all recommended tools, considering resource constraints?
A: We understand the challenges, and thus, we're developing a common subset of tools for all projects, prioritizing their importance for more feasible implementation.
Q: Are certain projects not mature enough for specific tools?
A: Yes, we recognize project maturity levels vary. We recommend waiting until your project reaches an appropriate stage, especially for tools like integration testing.
Q: What phases are prioritized in the continuous testing guidelines?
A: Security, verification and validation, and integration testing are considered essential phases and will be prioritized in our guidelines.
Q: How does licensing factor into the tool selection process, even for open-source tools?
A: Licensing is crucial, and we are actively exploring strategies to address licensing concerns, ensuring compliance even with open-source tools.
Q: Is continuous testing a one-time implementation, or can it be an iterative process?
A: We emphasize iterative implementation for continuous testing success, understanding that refining the process takes time.
Creditsβ
Authorship:
Acknowledgements:
- We are grateful to John Engelke and Dillon Dalton for their insightful comments and feedback, which have greatly improved this work.
- We also appreciate Drew Meyers and Luca Cinquini for providing exemplary best practices for various tests and pre-commit hooks.
Feedback and Contributionsβ
We welcome feedback and contributions to help improve and grow this page. Please see our contribution guidelines.