ggaaooppeenngg
Repos
70
Followers
172
Following
28

Events

support monitor podgroup

I suggest adding PodGroup as the custom resource by default when enable-gang-scheduling is enabled.

Created at 3 weeks ago

Add strict check for dependent callbacks

Signed-off-by: Peng Gao peng.gao.dut@gmail.com

Created at 3 weeks ago
Make a generic logger instead of the nil logger on dependent update

This pr fixes the previous wrong pr #1666 and uses generic functions for the podgroup callbacks. This pr also adds a test for the XXXFunc which only handles pod and service objects instead of other generic objects. @johnugeorge @D0m021ng @Crazybean-lwb

Created at 3 weeks ago

Use generic callback funcs for podgroup

Signed-off-by: Peng Gao peng.gao.dut@gmail.com

Add strict check for dependent callbacks

Signed-off-by: Peng Gao peng.gao.dut@gmail.com

Created at 3 weeks ago
OnDependentUpdateFunc for Job will panic when enable volcano scheduler

@johnugeorge It is for the pod. For other objects, it uses the generic function. https://github.com/kubeflow/training-operator/blob/74655a177a7d886d6e2d9c86c800271d4d44687b/pkg/controller.v1/mpi/mpijob_controller.go#L207

Created at 3 weeks ago
OnDependentUpdateFunc for Job will panic when enable volcano scheduler

@johnugeorge Only MPIJob controller uses OnDependentXXXFuncGeneric which initializes the generic logger at the beginning. I am trying to commit to a test for it.

Created at 3 weeks ago
OnDependentUpdateFunc for Job will panic when enable volcano scheduler

I only tested it with MPIJob. I made a temporary fix for this issue. @D0m021ng @johnugeorge

Created at 3 weeks ago
pull request opened
Make a generic logger instead of the nil logger on dependent update

Signed-off-by: Peng Gao peng.gao.dut@gmail.com

What this PR does / why we need it:

Make a temporary logger for generic object instead of being nil.

Which issue(s) this PR fixes

Fixes #1679 #1678

Checklist:

  • [ ] Docs included if any changes are user facing
Created at 3 weeks ago
ggaaooppeenngg create branch fix-volcano-panic
Created at 3 weeks ago
started
Created at 3 weeks ago
[QUESTION]

I created a PVC, mounted it, and write some files to check the persistence. It seems CSI driver can work without the setting metadata if I don't miss something.

Created at 1 month ago
[QUESTION]

Question

Before asking a question, make sure you have:

  • Reviewed relevant Kubernetes information: Google your error messages and look at K8s docs.
  • Searched open and closed GitHub issues
  • Read the documentation: JuiceFS CSI Driver Doc

What help did you want: juicefs mount with error "load setting: database is not formatted, please run juicefs format ...". In the redis I got, it seems setting was lost if I did not miss it. How to recover from the state? The csi-driver is keep working, I am afraid format will destroy the metadata. How to recovert from this state?

redis-cli -u redis://0.0.0.0:6379/7
0.0.0.0:6379[7]> scan 0
1) "0"
2) 1) "sessions"
   2) "totalInodes"
   3) "usedSpace"
0.0.0.0:6379[7]> get setting
(nil)

Environment:

  • JuiceFS CSI Driver version (which image tag did your CSI Driver use):
  • v0.7.0
  • Kubernetes version (e.g. kubectl version):
  • 1.20.1
  • Object storage (cloud provider and region):
  • minio
  • Metadata engine info (version, cloud provider managed or self maintained):
  • redis
  • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
  • ok
  • Others:
  • When I tested it with juicefs 0.10.0 in the several months ago, the mout command worked. But now it fails.
Created at 1 month ago
Add PodGroup as controller watch source

done @shinytang6 /assign @gaocegege

Created at 1 month ago

Remove extra blank lines

Signed-off-by: Peng Gao peng.gao.dut@gmail.com

Created at 1 month ago
Created at 1 month ago
Add PodGroup as controller watch source

Need a review @shinytang6

Created at 1 month ago
issue comment
Add job suspend semantics

How is this PR going now?

Created at 1 month ago
Created at 1 month ago
Created at 1 month ago
Created at 1 month ago
started
Created at 1 month ago
Created at 1 month ago
Created at 1 month ago
Created at 1 month ago
Created at 1 month ago
Created at 1 month ago
Created at 1 month ago
Created at 1 month ago
Add PodGroup as controller watch source

It's not easy to manage an external crd. I skip the watching if the crd is not found.

Created at 1 month ago

Remove the PodGroup crd

Signed-off-by: Peng Gao peng.gao.dut@gmail.com

Created at 1 month ago