StevenBarre
Repos
18
Followers
2

Collection of platform related tools and configurations

12
29

Gitbook URL of WIKI

16
20

OpenShift 3 and 4 product and community documentation

642
1430

Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems. https://docs.ansible.com.

56788
22216

Battlesnake Docs and API Reference

13
21

A simple Battlesnake written in Python.

0
0

Events

Investigate IBM MQ Operator

Andrew is out of office this week.

Created at 6 hours ago
Plan for preventing CronJobs from running too often

Email sent to 15 teams requesting feedback

Created at 6 hours ago
Revert Openshift DNS resolver configurations on all managed clusters

CHG0046919 completed for Silver

Created at 6 hours ago
Ensure all clusters have all pull secrets

https://github.com/bcgov-c/rhcos-ignition-builder/blob/master/inventory/group_vars/klab_util/vars.yaml#L23 the values in the vaults for all clusters should be compared.

And the actually installed secret in each cluster should be checked that it matches the vault value https://console.apps.klab.devops.gov.bc.ca/k8s/ns/openshift-config/secrets/pull-secret

Created at 3 days ago
Plan for preventing CronJobs from running too often

Get license plates with fast crons

oc get cj -A | grep -P '  \*(\/[1-4])?\s' | cut -f 1 -d' ' | cut -f1 -d'-' | sort -u

Drafted email and sent to Sal/Olena for review

Created at 3 days ago
Enable HTTP/2 on routers

Not going to move forward with implementing this in PROD as only one user is looking for it. Closing per direction from Sal.

Created at 3 days ago
Enable HTTP/2 on routers

Describe the issue A developer is looking to use http/2 and gRPC connections in Silver. https://cloud.redhat.com/blog/grpc-or-http/2-ingress-connectivity-in-openshift

Additional context https://docs.openshift.com/container-platform/4.7/networking/ingress-operator.html#nw-http2-haproxy_configuring-ingress

oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=true
  • RC user: ytqsl
  • namespace: b5e079-dev

Definition of done

  • Discuss with RH if this is safe to enable or has any side effects since its not enabled by default
  • If safe, enable in LAB and test
  • Enable in PROD
Created at 3 days ago
Reconcile differences between KLAB2 and Emerald NSX rules

Reviewed rules and made notes Booked a meeting for Monday to discuss / implement clean up of KLAB2

Created at 3 days ago
Get Sal setup on console.redhat.com

Meeting booked for Mar 31 with Justin and Olena to discuss

Created at 3 days ago
Get Sal setup on console.redhat.com

Sent Sal an account invite.

Created at 3 days ago
Test and validate using vanity domains in NSX clusters | SPIKE

After some discussion and trial and error, found that we have to specify the Second Level Domain in AVI to get this working. We've added bc.ca for now.

Updated the docs, and announced the info in RC.

Will leave in review for this sprint to see if we can get someone to test.

Created at 4 days ago
Update docs on NSX Internet-Ingress label

Describe the issue A re-read of the SDN Security Classification Model v1.0 document showed that

Low security classification workloads are considered “Non-Internet Accessible” by default, unless explicitly defined otherwise.

The docs currently have this the other way around.

What is the Value/Impact? Need the docs to match the policy

What is the plan? How will this get completed? Update the Emerald docs page.

Identify any dependencies https://github.com/bcgov-c/platform-gitops-gen/pull/633

Definition of done Updated docs and a post in Rocketchat about it

Created at 4 days ago
Update docs on NSX Internet-Ingress label

Docs updated and posted in RC about the change

Created at 4 days ago
AWS training for tbaker1313

Describe the issue Complete AWS training and pass the exam.

Additional context https://explore.skillbuilder.aws

How does this benefit the users of our platform? Skill building and to provide support for AWS Openshift clusters.

Definition of done

  • [ ] Complete AWS training
  • [ ] Pass AWS exam
Created at 4 days ago
Weekly CCM PROD Push for March 21-23

Describe the issue This ticket will track efforts spent toward creating the next CCM release and promoting it against the three production Openshift clusters.

Definition of done

  • [x] Create suitable GitHub Release before EOD Friday
  • [x] Create suitable Standard Change RFCs and assign to team members as appropriate
  • [x] Create and Receive approval for PR incrementing CCM version for PROD clusters
  • [x] Ensure version increment PR is merged before executing any RFCs
  • [x] Execute RFC for SILVER & EMERALD cluster
  • [x] Execute RFC for GOLD cluster
  • [x] Execute RFC for GOLDDR cluster
Created at 4 days ago
Weekly CCM PROD Push for March 21-23

CHG0046836 completed

Created at 4 days ago
Use Ansible Tower to label nodes

Describe the issue Now that we have a bunch of VM worker nodes, there is a risk that several related pods (say a HA Database) could end up on different nodes that are all on the same ESX host. Should that host have a failure the HA DB would suffer unexpected downtime.

The first step here is using our Ansible Tower instance to query vCenter and find which ESX hosts are holding which worker nodes and labeling them

What is the Value/Impact? Improve platform resiliency

What is the plan? How will this get completed?

  • Get a vCenter service account
  • Investigate if vCenter can trigger Tower whenever a VM is vMotioned
  • Write and test playbook to label nodes with their ESX host

Identify any dependencies Will need help from the Tower team and the VMware team

Definition of done

  • Playbook that labels nodes appropriately
  • Figure out if this should run via cron or trigger
  • Create ticket to adjust openshift scheduler to ensure pods use the new labels to keep themselves apart
Created at 5 days ago
Investige vROPs as a long term metric store

Describe the issue Investigate the feasability of using vROPs as a long term metric store instead of Nagios. This will help us make graphs showing cluster capacity over long time frames like a year and produce better reports.

What is the Value/Impact? Improved capacity planning

What is the plan? How will this get completed? TBD

Identify any dependencies TBD

Definition of done TBD

Created at 5 days ago
NSX Testing - Test the other direction

Describe the issue We've got a bunch of tests for OCP to classic servers, but nothing the other way. We can leverage Ansible Tower to run commands on our test hosts and have them curl back into OCP routes.

What is the Value/Impact? Continue improving the testing of the NSX guardrails

What is the plan? How will this get completed?

  • Create playbook in Ansible Tower that runs the checks and updates the testing configmap
  • Figure out how to trigger the playbook from the nsx util host on demand

Identify any dependencies May need some help from the Tower team

Definition of done PR created and merged for the test suite that will trigger and get the result of checks run on the test hosts

Created at 5 days ago
Reconcile differences between KLAB2 and Emerald NSX rules

Describe the issue As I've been building the test suite, Dan and I have been making changes to the KLAB2 guardrails to ensure they are doing what we expect.

Now we need to do a review pass, and make sure all the rules make sense and there's not extras or leftovers that need cleaning up. Then we need to copy the "good set" of rules from KLAB2 to Emerald.

What is the Value/Impact? Ensure guardrails are doing what we expect in PROD

What is the plan? How will this get completed?

  • Review KLAB2 rules and check hit counters
  • Clean up any superfluous rules
  • Diff the KLAB2 and Emerald rules
  • Meet with Dan to implement changes to Emerald rules

Identify any dependencies Will need help from Dan to make changes to NSX rules

Definition of done

  • Test suite passes in Emerald
  • Rules are the same between clusters
Created at 5 days ago
Create a policy to prevent CronJobs from running too often

Split the ticket into two. Moved all the planning to #3689

This ticket now just for the implementation in PROD

Created at 5 days ago
Plan for preventing CronJobs from running too often

Describe the issue CronJobs that run more often than every 5 minutes impacts the performance of the k8s API and the cluster.

What is the Value/Impact? Improved cluster performance and stability

What is the plan? How will this get completed?

  • Discuss with stakeholders - this will be a big change and impact a lot of teams that are currently doing this
  • Decide if we will proceed
  • Test Kyverno policy in a LAB
  • Announce via a Community Meetup and email blast

Identify any dependencies Need input form Product Owner, maybe some of the most impacted teams, and @jleach

Definition of done

  • Planned implementation date selected
  • Announcements made
Created at 5 days ago
Fix issues with the pipelines operator and SCC

We should open a case with RH to confirm the steps and impacts of the change

Created at 5 days ago
Weekly CCM PROD Push for March 21-23

CHG0046835 completed

Created at 5 days ago
Update docs on NSX Internet-Ingress label

Describe the issue A re-read of the SDN Security Classification Model v1.0 document showed that

Low security classification workloads are considered “Non-Internet Accessible” by default, unless explicitly defined otherwise.

The docs currently have this the other way around.

What is the Value/Impact? Need the docs to match the policy

What is the plan? How will this get completed? Update the Emerald docs page.

Identify any dependencies https://github.com/bcgov-c/platform-gitops-gen/pull/633

Definition of done Updated docs and a post in Rocketchat about it

Created at 6 days ago

Update ops-and-shared-services.md

Created at 6 days ago
Test and validate using vanity domains in NSX clusters | SPIKE

Dan it talking to our VMware TAM about this. Had a call with Dan today to discuss some of the options and review the docs. We may need to explicitly list vanity domains in AVI to get this working. Still looking for a self-serve friendly solution.

Created at 6 days ago