Getting Started With RunWhen
We are excited that you are interested in starting a Proof-of-Concept (PoC) with our team! This post is a collection of best practices we hope you will find useful.
This post assumes that you already have in mind a project where we can help. Many of our projects center around a few themes:
- Accelerating an application modernization / infrastructure modernization / cloud migration project
- Reducing reliance on low service quality outsourced RunOps consultancies
- Automating alert and ticket response to free up your top platform engineers
- Re-thinking a potential runbook automation project
- Supporting application developers of you-built-it-you-run-it services
If you'd like to brainstorm together, book a meeting with our Founder here.
1. Try The Tutorials In Our Sandbox
20-60 minutes
The RunWhen tutorials are the fastest way to see how the RunWhen Platform works in action. They can be found here. Each one takes 5-15 minutes. They demonstrate a range of capabilities from self-service troubleshooting to using Digital Assistants to respond to your existing alerts or new SLO violations.
Can our team help? The next few steps may require reviews, PoC/PoV targets, business cases, management presentations, approvals, etc. in the process to building the consensus you need. We can share best practices and provide materials from a library we have built up working with a variety of organizations. Click here to book a short meeting.
2. Install RunWhen Local
10-30 minutes
RunWhen Local is our in-cluster agent. The getting started guide can be found here. It serves several purposes, but for now use RunWhen Local to scan your Kubernetes clusters -- anything from a single namespace to multi-cluster environment. It will compare resources it finds with scripts in RunWhen Authors' libraries, creating matches for you to review. The output is a) a stand-alone mkdocs server so you can browse the troubleshooting scripts that matched, and b) a set of Workspace configuration files that the tool can upload.
Is it sending my data anywhere? RunWhen Local is an open source project with over 9,000 downloads. It operates in an air-gapped mode of operations until you explicitly start the process of uploading files to a RunWhen Workspace.
3. Create Your Pilot Workspace And Upload
10 minutes
Coming back to the RunWhen Platform, log in to create a new, empty Workspace that can receive uploaded configuration files. The details on how to upload the files from RunWhen Local to your new Workspace are here. At the end of this process, you will have a Workspace that you can demo to colleagues though it will not have credentials needed for Digital Assistants to run Tasks (see below).
Working on a business case? Our team (or our professional services partners) can help build a business case for moving forward. We have templates and benchmarking data for areas such as from accelerating and reducing the cost of application/infrastructure modernization projects, reducing the cost of outsourced 24/7 RunOps, freeing up senior engineers from repetitive alert and ticket responses, or improving you-built-it-you-run-it support. Click here to book a session with our team on this topic.
4. Running Your First Tasks
By default, a Workspace can not access your clusters. While Digital Assistants can suggest Tasks, they can not run them or raise Issues. To run Tasks, either i) a kubeconfig needs to be uploaded to the RunWhen Vault instance (fastest), ii) the RunWhen Local Runner needs to be configured to ask for control signals from the RunWhen SaaS Platform (second fastest), or iii) you need a Kubernetes cluster to self-host the entire RunWhen stack (most involved). For (ii) and (iii), we have technical members on staff who can help through software topologies and advanced security considerations.
Security review? If you require a formal security review, our team has a library of materials that can help. You may also note that our architecture ensures enterprise data does not get sent to external LLMs used in LLM training. See our Security and Trust overview here.
5. Inviting A Few Colleagues To Your Pilot Workspace
3-4 hours
We strongly recommend inviting colleagues from both platform/SRE and application development teams to participate early in the process. These typically take the form of one hour workshop sessions where you help them log in to the Workspace you have created. You can create tours and tutorials for them to follow. The most successful workshops we have seen include a short follow-up session 3-7 days later to brainstorm on points that could be added to the map. Note that the RunWhen team (pre-sales), RunWhen professional services partners (services engagement) or RunWhen Authors (bounties) can assist in authoring scripts.
"Generics"? Before authoring new scripts, most teams at this stage simply use our "generics" Code Collection. These "generics" are scripts that you can quickly configure with REST API calls, GRPC calls, SQL queries, cloud CLI commands, Kubectl commands, etc. They take only a few minutes to add to any map and are a powerful tool for you and your colleagues.
6. First Production Workspace, Customization and Roll-Out
1-4 weeks
If at this point your organization chooses to move forward, we typically recommend starting a fresh "production" Workspace and running the RunWhen Local discovery and upload process again. Inviting colleagues, particularly application developers, in cohorts and repeating the process of workshop followed by a brainstorm of points to add typically works well as part of the roll out strategy. Rather than try to overly customize a Workspace before rolling out, we strongly recommend a process of continuous improvement where you can use <a href="https://docs.runwhen.com/public/runwhen-platform/feature-overview/runsessions/runsession-reports">analytics from the Workspace Reports</a> to inform gaps in automation coverage. Continuous improvement steps may involve areas such as:
- Adding Tasks to the Workspace specific to application dev or test flows, e.g. collecting diagnostics beyond the defaults that your app developers use for bug fixing
- Connecting Digital Assistants in the Workspace with a set of existing Alerts from AlertManager, PagerDuty, OpsGenie, DataDog, etc. (see docs here)
- Connecting Digital Assistants in the Workspace with your ticketing system so they can enrich tickets with links to RunWhen Reports (see docs here)
- Rolling out the RunWhen Slack application to lower the bar for self-service troubleshooting
- Refining the the default SLOs (e.g. namespace health) or adding SLOs or customer or partner-facing metrics
- Adding an Error Budget Review and/or Automation Coverage Review to your monthly operations meetings using RunWhen analytics
7. Value Checkpoint
1/2 day
Consider setting a 'value checkpoint' 2-3 months after your initial cohort of users has been invited to your first production Workspace. The RunWhen Workspace analytics export can snapshot the usage data collected in your Workspace to gsheets, excel, csv and many other formats. A template gsheet / excel workbook is linkedin the application with customizable Charts and PivotTables to show usage and time saved by user, by group, by cloud resource, etc. Automation coverage reports can pinpoint specific next steps from live Run Sessions where automation is missing, and even provide projections on the amount of time that future automation can save.
At any step along the way, our team is here to help. We learn from every engagement, and consider every interaction with you at any step of the journey to be valuable regardless of the outcome. I look forward to working with you. -Kyle Forster