In our recent blog post Defining the Web3 Security Maturity Model (SMM), we presented an assessment framework that Web3 projects can use to evaluate their current level of security maturity within the Web3 Secure Development Life Cycle (SDLC). The framework presents a list of measurement criteria in each of the four SDLC phases (Design, Develop, Deploy, and Defend). By measuring themselves against each of these criteria, projects can determine whether their security level is Minimum, Improved, or Advanced in each area.
Previous blog posts have covered the Security Maturity Models for the Design, Develop, and Deploy phases of the Web3 SDLC. In this blog post, we categorize the security criteria of the Defend phase into the following four subdomains: Monitoring, Incident Response, Bug Bounties, and Insurance. We will also define each subdomain and criteria, and explain the rationale for why they were chosen.
Monitoring is the observation of transactions to/from your project to understand expected behavior (operational monitoring) and detect anomalous activity (threat monitoring). Anomalous activity can be intentional and malicious, such as when attackers exploit vulnerabilities, or accidental, such as when undiscovered bugs cause errors and reverted transactions. The patterns of expected behavior that are learned through operational monitoring make it possible to identify unexpected patterns that pose security risks, such as abnormally large transaction volumes. Beyond the risks operational monitoring exposes, it also reveals opportunities for product improvement and new features as it uncovers the usage trends and patterns seen most often. Operational monitoring also presents the opportunity to build trust and transparency with the user community as monitoring data is shared through displays and dashboards.
Within the Monitoring subdomain, the following criteria are directed towards developers, support personnel, and incident response personnel.
Coverage Scope (Operational/Threat)
- Definition: The types and numbers of threats or system health conditions being monitored.
- Rationale: The awareness of how your project is operating and the ability to know which conditions are normal and which may be of concern is directly related to how many different conditions you can monitor. Both threat monitoring and operational monitoring are necessary to gain a complete picture. While a core level of security may be achieved by just monitoring for specific known threats, an important additional layer of security can be achieved by monitoring your project’s operational conditions to understand its normal behavior patterns as well.
- Minimum: Contract is monitored for unexpected error conditions such as errors and excessive reverts.
- Improved: Administrative events are monitored. Contract health is monitored for expected conditions.
- Advanced: All threats identified in the threat model are monitored.
- Definition: The infrastructure on which monitoring systems run.
- Rationale: The choice of monitoring platform impacts reliability and the level of automation you can incorporate into your monitoring system. An in-house monitoring platform will also carry with it the technical burden and cost of keeping the monitoring infrastructure running. Dedicated third-party monitoring platforms manage all of these issues for a fee.
- Minimum: Monitoring is manually performed. Monitoring consists of viewing transactions on block explorers.
- Improved: Monitoring uses existing monitoring templates. Monitoring is scripted on the project’s own infrastructure.
- Advanced: Monitoring is automatic. Monitoring runs on dedicated third party monitoring platforms.
- Definition: The channels of information that can be used to monitor a project.
- Rationale: Some data sources can only provide historical information about a project, while others can provide a window into operations that have not yet been finalized. Having access to real-time data that allows you to alter or prevent problems before they occur provides greater security than simply observing historical data after the fact.
- Minimum: Data feeds are manual (e.g., viewing web pages).
- Improved: Automated data feeds are available through remote APIs and/or JSON-RPC interfaces.
- Advanced: Real-time mempool data is integrated into monitoring data feeds.
- Definition: The mechanisms used by the monitoring system to alert project members when anomalies are detected.
- Rationale: Monitoring for anomalies is only useful if action is taken to resolve problems as they arise. This is only possible if the monitoring system has mechanisms for alerting project members when anomalies are detected. Notification mechanisms can range from simple log files that are manually monitored, to automated emails and text messages, to graphical dashboards displaying real-time data.
- Minimum: Monitoring alerts are logged and viewable to developers.
- Improved: Monitoring automatically raises alerts to developers.
- Advanced: Dashboards are used to convey additional monitoring data.
Once the security risks and system health concerns of your project have been identified and are being monitored, an incident response plan should be created for each potential anomaly. When monitoring reveals a potential issue, incident response measures aim to understand the issue, contain the issue, limit further damage, and restore desired functionality. Additional goals of incident response include communication with affected parties and follow up to apply lessons learned.
Incident Response cuts across several groups relevant to a protocol. Within the Incident Response subdomain, the following criteria are directed towards developers, quality assurance personnel, internal and external security teams, and relevant administrative stakeholders.
- Definition: The process of evaluating monitoring alerts and/or active security vulnerabilities to decide what course of action should be taken and its level of urgency.
- Rationale: When a potential issue is looming or has already occurred, time is often of the essence in preventing further loss or damage. The less skilled a person is in their understanding of the system and the potential impacts of incidents, the more difficult it will be to properly assess the right course of action. This is why the greatest benefit can be obtained from the triage process by including key project personnel and Web3 security experts.
- Minimum: Triage is performed by developers. Developers decide appropriate responses.
- Improved: Triage is performed by an internal security team. Triage findings are communicated to project decision makers who decide appropriate responses.
- Advanced: A Web3 security firm is on retainer to assist with triage.
- Definition: The ability to execute Incident Response Plans (IRPs) without human intervention.
- Rationale: Automated processes are more likely to be carried out consistently, thoroughly, and with fewer mistakes than manual processes. The greatest level of benefit is achieved when automated IRPs are automatically triggered by the monitoring system. However, there are response scenarios where it is preferable to have some level of human intervention. Examples of this are when the IRPs themselves may negatively affect users, such as when pausing a smart contract to perform further triage. Having the input of qualified experts available through the decision-making process is a key benefit.
- Minimum: Incident response plans (IRPs) are manually performed.
- Improved: IRPs are automated. IRPs must be manually triggered.
- Advanced: IRPs are automatically triggered by the monitoring system.
Testing and Updates
- Definition: The process of updating IRPs to reflect changes in the project, and regularly testing the IRPs to ensure that they still work as expected.
- Rationale: When changes are made to a project’s code, dependencies, or operating environment, the incident response procedures that formerly worked can fail under the new conditions. There is no worse time to discover that your response procedures are broken than in a moment of crisis. Therefore, projects should have repeatable test suites that check all IRPs against expected outcomes. At a minimum, the tests should be run every time there is a project change, but ideally, the tests should be run on a regular basis even when no changes have occurred to account for unknown variables not directly under the project’s control.
- Minimum: IRPs are updated whenever changes are made to the project. IRPs are tested after they are changed.
- Improved: A project’s threat landscape is periodically reviewed, and IRPs are updated to reflect all new threats. IRPs are tested after they are changed.
- Advanced: IRPs are regularly tested even when no project changes have occurred.
- Definition: The post-incident Root Cause Analysis (RCA) and communication of findings to affected parties.
- Rationale: Analyzing the root cause of incidents is extremely important because problems that occur once are likely to occur again. This is true whether talking about system performance issues or malicious attacks. It is also critical to communicate findings with all affected parties. Internal communications allow key project members to be part of the decision-making process for preventing future incidents, while external communications build trust and transparency with a project’s user community.
- Minimum: After incidents, Root Cause Analysis (RCA) is performed by developers. Findings are communicated internally.
- Improved: RCA is performed by a multi-disciplinary internal team (designers, developers, etc.). Findings are communicated to the public.
- Advanced: RCA is performed by a professional Web3 security firm. Findings are communicated to the public.
Bug Bounties are rewards that projects offer to people who report bugs or vulnerabilities in a project’s code. Because security-minded companies may have already spent considerable time and expense eliminating as many issues as possible from their design and code, any issues that remain may be challenging to find. Bug bounty programs provide an attractive incentive to white hat hackers and security researchers to search for any remaining vulnerabilities that may exist.
Within the Bug Bounties subdomain, the following criteria are directed towards management and finance personnel. Developers will also play a part in deciding the scope of code to be included in bounties.
Platform and Reach
- Definition: The location where bounty information is hosted and how it is communicated to the security research community.
- Rationale: The effectiveness of bug bounties increases in proportion to the number of qualified security researchers participating in the process. Projects can rely on their own website or social media channels to get the word out for new bounties, but dedicated bug bounty platforms exist where bug bounties can be hosted. Dedicated platforms likely have the widest reach to the intended audience.
- Minimum: Bounty program information is available on the company website. A general purpose communication line exists which can be used to communicate bounty findings.
- Improved: Bounty programs are regularly communicated through company social media platforms. Dedicated security communication lines exist for bounty findings.
- Advanced: Bounty programs are run on dedicated bounty platforms.
- Definition: The parts of a project’s code and dependencies that are included in the bounty.
- Rationale: Scoping determines which system components are scrutinized by bug bounty participants. If components are excluded, they may not receive attention to uncover bugs. While some bounties focus on just the on-chain components of a Web3 project, others include off-chain components as well. Since bugs in off-chain components can impact the security of your protocol, if your project has such components, there is a benefit to including them in bug bounties.
- Minimum: Bounties are offered for select on-chain components.
- Improved: Bounties are offered for all on-chain components.
- Advanced: Bounties are offered for all on- and off-chain components.
Open Bounties and Contests
- Definition: The types of bug bounty programs your protocol participates in. Open bounties and contests are the two main types of bug bounties. Open bounties are bug bounties that run perpetually, as opposed to contests, which usually have a defined start and end date.
- Rationale: Different bug bounty styles have different advantages, and using more than one style can yield greater benefits. When a project has an upcoming release of new or changed code, they may want to attract as much attention as possible to the bounty process before the code’s release date. Contests are a great way to generate interest and excitement within a specific time frame. On the flip side, projects may also want to leave a perpetual bounty in place to continually incentivize researchers to analyze the code.
- Minimum: Open bounties are perpetually offered.
- Improved: Contests are run for major project changes in addition to open bounties.
- Advanced: Contests are run whenever changes are released. Contests are led by senior security auditors.
- Definition: The prizes (usually monetary) offered to researchers who find and report vulnerabilities.
- Rationale: Financial rewards have proven to be a strong incentive in motivating people to search for bugs. When deciding how big the rewards offered should be, a general guideline is to set the reward amount as a percentage of the value that could have been lost had the vulnerability been exploited before it was fixed. This value is therefore directly proportional to the total value locked (TVL) in the project.
- Minimum: Reward is up to 5% of the value potentially lost through the discovered vulnerability.
- Improved: Reward is up to 10% of the value potentially lost through the discovered vulnerability.
- Advanced: Reward is up to 20% of the value potentially lost through the discovered vulnerability (max 10% of project TVL).
Insurance is a type of protection that is purchased from Web3 insurance companies and that goes into effect when losses occur from exploited vulnerabilities. Insurance replaces some or all of the losses that result from hacks, scams, or operational mistakes. Insurance serves as a final line of defense in a comprehensive security plan to protect a project and its users by preparing for worst case scenarios.
Within the Insurance subdomain, the following criteria are directed towards management and finance personnel.
- Definition: The amount of losses that are replaced by insurance in the event that losses are incurred.
- Rationale: The higher the coverage level, the more confidence a project’s users will have to participate in the project. Higher coverage levels also can ensure a project’s ability to stay in business in the face of massive losses that would otherwise cripple a business and drive away future customers. Because the cost of insurance typically increases with the coverage level, the potential for losses must be carefully balanced with a project’s financial needs and goals.
- Minimum: Insurance covers a fixed dollar amount (to be divided proportionally among users).
- Improved: Insurance covers 100% of user investments.
- Advanced: Insurance covers all potential losses (includes user investments and company losses).
Why You Can Trust Arbitrary Execution
Arbitrary Execution (AE) is an engineering-focused organization that specializes in securing decentralized technology. Our team of security researchers leverage their offensive security expertise, tactics, techniques, and hacker mindset to help secure the crypto ecosystem. In the two years since the company’s inception, Arbitrary Execution has performed more than 50 audits of Web3 protocols and projects, as well as created tools that continuously monitor the blockchain for anomalous activity. For more information on Arbitrary Execution's professional services, contact firstname.lastname@example.org. Follow us on Twitter and LinkedIn for updates on our latest projects, including the Web3 SDLC.