Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems. https://docs.ansible.com.
Andrew is out of office this week.
Email sent to 15 teams requesting feedback
CHG0046919 completed for Silver
https://github.com/bcgov-c/rhcos-ignition-builder/blob/master/inventory/group_vars/klab_util/vars.yaml#L23 the values in the vaults for all clusters should be compared.
And the actually installed secret in each cluster should be checked that it matches the vault value https://console.apps.klab.devops.gov.bc.ca/k8s/ns/openshift-config/secrets/pull-secret
Get license plates with fast crons
oc get cj -A | grep -P ' \*(\/[1-4])?\s' | cut -f 1 -d' ' | cut -f1 -d'-' | sort -u
Drafted email and sent to Sal/Olena for review
Not going to move forward with implementing this in PROD as only one user is looking for it. Closing per direction from Sal.
Describe the issue A developer is looking to use http/2 and gRPC connections in Silver. https://cloud.redhat.com/blog/grpc-or-http/2-ingress-connectivity-in-openshift
Additional context https://docs.openshift.com/container-platform/4.7/networking/ingress-operator.html#nw-http2-haproxy_configuring-ingress
oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=true
Definition of done
Reviewed rules and made notes Booked a meeting for Monday to discuss / implement clean up of KLAB2
Meeting booked for Mar 31 with Justin and Olena to discuss
Sent Sal an account invite.
After some discussion and trial and error, found that we have to specify the Second Level Domain in AVI to get this working. We've added bc.ca
for now.
Updated the docs, and announced the info in RC.
Will leave in review for this sprint to see if we can get someone to test.
Describe the issue A re-read of the SDN Security Classification Model v1.0 document showed that
Low security classification workloads are considered “Non-Internet Accessible” by default, unless explicitly defined otherwise.
The docs currently have this the other way around.
What is the Value/Impact? Need the docs to match the policy
What is the plan? How will this get completed? Update the Emerald docs page.
Identify any dependencies https://github.com/bcgov-c/platform-gitops-gen/pull/633
Definition of done Updated docs and a post in Rocketchat about it
Docs updated and posted in RC about the change
Describe the issue Complete AWS training and pass the exam.
Additional context https://explore.skillbuilder.aws
How does this benefit the users of our platform? Skill building and to provide support for AWS Openshift clusters.
Definition of done
Describe the issue This ticket will track efforts spent toward creating the next CCM release and promoting it against the three production Openshift clusters.
Definition of done
CHG0046836 completed
Describe the issue Now that we have a bunch of VM worker nodes, there is a risk that several related pods (say a HA Database) could end up on different nodes that are all on the same ESX host. Should that host have a failure the HA DB would suffer unexpected downtime.
The first step here is using our Ansible Tower instance to query vCenter and find which ESX hosts are holding which worker nodes and labeling them
What is the Value/Impact? Improve platform resiliency
What is the plan? How will this get completed?
Identify any dependencies Will need help from the Tower team and the VMware team
Definition of done
Describe the issue Investigate the feasability of using vROPs as a long term metric store instead of Nagios. This will help us make graphs showing cluster capacity over long time frames like a year and produce better reports.
What is the Value/Impact? Improved capacity planning
What is the plan? How will this get completed? TBD
Identify any dependencies TBD
Definition of done TBD
Describe the issue We've got a bunch of tests for OCP to classic servers, but nothing the other way. We can leverage Ansible Tower to run commands on our test hosts and have them curl back into OCP routes.
What is the Value/Impact? Continue improving the testing of the NSX guardrails
What is the plan? How will this get completed?
Identify any dependencies May need some help from the Tower team
Definition of done PR created and merged for the test suite that will trigger and get the result of checks run on the test hosts
Describe the issue As I've been building the test suite, Dan and I have been making changes to the KLAB2 guardrails to ensure they are doing what we expect.
Now we need to do a review pass, and make sure all the rules make sense and there's not extras or leftovers that need cleaning up. Then we need to copy the "good set" of rules from KLAB2 to Emerald.
What is the Value/Impact? Ensure guardrails are doing what we expect in PROD
What is the plan? How will this get completed?
Identify any dependencies Will need help from Dan to make changes to NSX rules
Definition of done
Split the ticket into two. Moved all the planning to #3689
This ticket now just for the implementation in PROD
Describe the issue CronJobs that run more often than every 5 minutes impacts the performance of the k8s API and the cluster.
What is the Value/Impact? Improved cluster performance and stability
What is the plan? How will this get completed?
Identify any dependencies Need input form Product Owner, maybe some of the most impacted teams, and @jleach
Definition of done
We should open a case with RH to confirm the steps and impacts of the change
CHG0046835 completed
Describe the issue A re-read of the SDN Security Classification Model v1.0 document showed that
Low security classification workloads are considered “Non-Internet Accessible” by default, unless explicitly defined otherwise.
The docs currently have this the other way around.
What is the Value/Impact? Need the docs to match the policy
What is the plan? How will this get completed? Update the Emerald docs page.
Identify any dependencies https://github.com/bcgov-c/platform-gitops-gen/pull/633
Definition of done Updated docs and a post in Rocketchat about it
Update ops-and-shared-services.md
Dan it talking to our VMware TAM about this. Had a call with Dan today to discuss some of the options and review the docs. We may need to explicitly list vanity domains in AVI to get this working. Still looking for a self-serve friendly solution.