StevenBarre
Repos
17
Followers
2

Collection of platform related tools and configurations

11
28

Gitbook URL of WIKI

16
18

OpenShift 3 and 4 product and community documentation

608
1341

Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems. https://docs.ansible.com.

54851
21575

Battlesnake Docs and API Reference

13
20

A simple Battlesnake written in Python.

0
0

Events

Review ROSA todo

Pruner is handled by RH https://github.com/openshift/managed-cluster-config/tree/master/deploy/sre-pruning

Can't push metrics to AWS from Ansible, but can call the CLI https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html#publishingDataPoints https://awscli.amazonaws.com/v2/documentation/api/latest/reference/cloudwatch/put-metric-data.html

Could use CHES for sending emails from IRN.

Created at 13 minutes ago
issue comment
Vault doc update

Should this get moved out of the how-to github and into https://beta-docs.developer.gov.bc.ca/ ?

Created at 48 minutes ago
Investigate user-defined alerting to ensure it doesnt alert on-call team

Describe the issue This https://github.com/bcgov-c/platform-ops/blob/c7e66db40ba469829f6ede77ea7bb8df109f7435/roles/config-infra/templates/alertmanager.yaml.j2#L19-L21 was added to the AlertManager config to hide user defined alerts from the Ops team. But a recent test seems to show that it is no longer catching user defined alerts.

What is the Value/Impact? Prevent product teams from creating alerts that wake up on-call

What is the plan? How will this get completed? Create a user-defined alert PrometheusRule that is firing. Test that it doesnt alert on-call.

Identify any dependencies May need to open a case with Red Hat for advice.

Definition of done Solution in place to prevent users from making alerts that go to on-call.

Created at 2 hours ago
Review ROSA todo

Need docs on howto install a cluster

Need docs on howto connect to web console

Created at 1 day ago
Review ROSA todo

Reviewing the list of tools we install

  • etcd backup
    • should be covered by RH managed team
  • image-registry-notifier
    • how do we send email from AWS?
  • Nagios
    • Nagios isn't accessable outside the datacenter.
    • Much of the monitoring isn't needed since this is a managed service
    • Need to look into using CloudWatch to send long term metrics to
  • perfmon
    • Should be able to get this running without special perms or changes
  • Pruner
    • Opened https://access.redhat.com/support/cases/#/case/03329035 to ask RH what they do for build/deploy pruning
Created at 1 day ago
Document how to install ROSA in Gov SEA

Describe the issue There's some custom steps needed to install ROSA into the BC Gov SEA. Document the needed steps.

What is the Value/Impact? Easy to install more clusters in the future.

What is the plan? How will this get completed? Add doc to https://github.com/bcgov-c/advsol-docs

Identify any dependencies Need help from RH Phil

Definition of done PR merged

Created at 1 day ago
OCP 4.10 Install Web Terminal Operator via CCM in CLAB

https://access.redhat.com/support/cases/#/case/03310194 documents some needed NetPol to get this working

Created at 1 day ago
OCP 4.10 Review Release Notes | EPIC

Describe the issue Have the team review the Release Notes for OCP 4.10 for any impacting or exciting changes that need to be communicated to the community.

What is the Value/Impact? Ensure the product teams are kept informed about changes they need to be aware of.

What is the plan? How will this get completed? Have each team member review the Release Notes, then meet to discuss what they found and what needs to be communicated. Help Olena create a slide for the community meetup, and an email via Mautic (if required).

Identify any dependencies Everyone needs to read the notes before the meeting can happen.

Definition of done Community informed of upcoming changes to the clusters.

Created at 1 day ago
KLAB2 - NSX-T Integration Proof of Concept with Openshift 4

Describe the epic Top level epic to track all development and operational work put into the NSX-T Proof of Concept testing with Openshift. Sub tasks may end up in multiple epics for better tracking and grouping.

Additional context https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.1/ncp-openshift/GUID-5597CF1F-62FC-4F36-BDFC-D1FCB3BA95BB.html https://www.vrealize.it/2021/03/24/nsx-t-ncp-integration-with-openshift-4-6-the-easy-way/

Definition of done

  • [x] All required development work to set up KLAB2 to have all the configuration settings and integrated with CCM and all security tools in order to make it identincal to KLAB cluster with the exception of the SDN technology.
Created at 1 day ago
SDN - Build and Configure Production cluster EMERALD for Early Adopters

Describe the epic Top level epic to track all development and operational work put into the NSX-T production cluster EMERALD. Sub tasks may end up in multiple epics for better tracking and grouping.

Definition of done

  • [x] Confirm KLAB2 rebuild and re-deployment was successful as EMERALD will build off of design work based off of KLAB2.
  • [x] Build and Configure EMERALD so that it is fully functional and integrated with Project Registry to allow Early Adopters to start using the cluster.
Created at 1 day ago
Test Installation instructions and successfully create ROSA cluster

Describe the issue A clear and concise description of what you want to happen.

Additional context Add any other context, attachments or screenshots

Definition of done

  • [x] DXC team member recreates ROSA cluster via instructions/documentation provided by Phil
Created at 1 day ago
Test Installation instructions and successfully create ROSA cluster

This was completed in August

Created at 1 day ago
Fix MCS_AlertManager_Volume Nagios metric

Describe the issue The Nagios graph for MCS_AlertManager_Volume appears to have no data anymore.

https://github.com/bcgov-c/platform-tools/blob/989fad82920296c5c1e376475115fc416c15332a/nagios/runner/project/roles/nagios/tasks/long-term-metrics.yaml#L502

What is the Value/Impact? No graph showing long term changes in usage

What is the plan? How will this get completed?

  • Investigate the prom query used to get the data.
  • Test a fix in lab
  • Roll out updated Nagios to all clusters

Identify any dependencies None

Definition of done All clusters have working MCS_AlertManager_Volume graph in Nagios

Created at 1 day ago
Update Forward Proxy Allow List

Describe the issue iStore Order for: Date Submitted: iStore Number: Lead time (business days after EA Approves):

Definition of done

  • [x] Business area creates/submits order
  • [ ] CITZ Service Desk reviews order, creates iStore Number, sends to EA for Approval
  • [ ] EA Approves
  • [ ] Order is sent to DXC for fulfillment
  • [ ] Once order is fulfilled/shipped by DXC, CITZ Service Desk sends 'Completed Order' notification to business area

iStore Order Lifecycle.png

Created at 5 days ago
Update Forward Proxy Allow List

Replaced by #3109

Created at 5 days ago
Review ROSA todo

Describe the issue Now that we have a ROSA cluster installed in SEA with public access, spend some time reviewing and making tickets of what else needs to be completed and tested to have a fully running cluster.

What is the Value/Impact?

  • Ensure efficient use of RH consulting hours
  • Progress towards completing the ROSA SCO

What is the plan? How will this get completed? Review SCO deliverables, make some tickets, review with Phil

Identify any dependencies Need some time from Phil near the end of working on this

Definition of done Tickets created of remaining ROSA work to do

Created at 5 days ago