mrdavidlaing
Repos
103
Followers
105
Following
2

A data orchestrator for machine learning, analytics, and ETL.

0
0

Koans to learn Javascript

2275
2743

Events

Add some helpful command line tools

Created at 4 days ago

Setup google-cloud-sdk

Created at 1 week ago
vdk-core: New feature: StandaloneDataJob

@tozka Am I correct in deducing that this feature landed in https://pypi.org/project/vdk-core/0.2.536648262/ ?

Created at 1 week ago
mrdavidlaing create branch main
Created at 1 week ago
create repository
mrdavidlaing create repository
Created at 1 week ago
mrdavidlaing create branch main
Created at 1 week ago
create repository
mrdavidlaing create repository
Created at 1 week ago
vdk-core: New feature: StandaloneDataJob

@tozka I've addressed the Code Analysis warnings and squashed the commits into a single commit to ease merging.

I've also run projects/vdk-core/cicd/build.sh on my local machine which seemed to succeed barring the failure below that I don't think is related to my changes

============================================================================== short test summary info ===============================================================================
FAILED tests/functional/run/test_run_sql_queries.py::test_run_dbapi_connection_no_such_db_type - assert 'VdkConfigurationError' in '2022-05-11 14:34:25,169 [VDK] simple-create-ins...

Results (42.93s):
     253 passed
       1 failed
         - tests/functional/run/test_run_sql_queries.py:72 test_run_dbapi_connection_no_such_db_type

Are you happy to progress to merging this PR?

Created at 1 week ago

vdk-core: New feature: StandaloneDataJob (#791)

  • Instantiate and execute plugin lifecycle from code rather than via the VDK CLI
  • Gives access to an instantiated job_input object
  • Can be run without needing any data job files
  • Implemented as a contextmanager to reduce API surface area
  • Triggers all plugin hooks except: * CoreHookSpecs.vdk_command_line

Sample usage:

with StandaloneDataJobFactory.create(datajob_directory) as job_input:
    #... use job_input object to interact with SuperCollider
                                
Created at 1 week ago

control-service: Kerberos authentication IT (#798)

why: As a part of recently introduced functionality to provide kerberos authentication we needed to add an integration test with a running control service.

what: Added Kerberos authentication tests against a running control service. The test creates it's own Kerberos server (using apache kerby) and creates as unique keytab used to authenticate against the server each time. The test verifies that authentication works as expected.

tests: n/a

Signed-off-by: Momchil Zhivkov mzhivkov@vmware.com

Revert "control-service: Kerberos authentication IT (#798)" (#828)

This reverts commit 6c5a4b06bc9c11b4496eaa77ab89cea649de80fe.

airflow-provider-vdk: VDKOperator execute method (#823)

This change implements the execute method for the VDKOperator. This method triggers a job. execution and is ran when the operator is used inside an Airflow DAG.

Testing done: tested locally, unit test

Signed-off-by: Gabriel Georgiev gageorgiev@vmware.com

control-service: unit tests should run on pull requests (#830)

Signed-off-by: mrMoZ1 mzhivkov@vmware.com

control-service: kerberos authentication IT test (#831)

why: As a part of recently introduced functionality to provide kerberos authentication we needed to add an integration test with a running control service.

what: Added Kerberos authentication tests against a running control service. The test creates it's own Kerberos server (using apache kerby) and creates as unique keytab used to authenticate against the server each time. The test verifies that authentication works as expected.

tests: n/a

Signed-off-by: Momchil Zhivkov mzhivkov@vmware.com

Merge branch 'main' into 785-instantiate_job_input_from_code

Created at 1 week ago

vdk-core: Use step type "noop" (#791)

Signed-off-by: David Laing david@davidlaing.com

Created at 1 week ago

vdk-core: Address Codacy issues (#791)

Signed-off-by: David Laing david@davidlaing.com

Created at 1 week ago
WIP: vdk-core: New feature: NoOpDataJob

@tozka Let me address the remaining Code Analysis warnings and squash the commits to make the merge nice and clean.

I'll ping you here when that is done

Created at 1 week ago
WIP: vdk-core: New feature: NoOpDataJob

@tozka With 54f4a09 done I'm happy that we have a basic set of functional tests. Are there any other tests you'd like to see added?

Created at 2 weeks ago

vdk-core: Test vdk_exit and vdk_exception hooks are called (#791)

Signed-off-by: David Laing david@davidlaing.com

Created at 2 weeks ago

vdk-core: Test hooks are called (#791)

Signed-off-by: David Laing david@davidlaing.com

Created at 2 weeks ago

vdk-core: Add basic functional test (#791)

Signed-off-by: David Laing david@davidlaing.com

Created at 2 weeks ago

Add Google Cloud SDK

Manage .zprofile & fix bug with PATH

Created at 2 weeks ago

Setup VSCode, Homebrew, Ruby & Shopify Dev settings for zsh

Created at 2 weeks ago

Add .gitconfig

Add pyenv-virtualenv & upgrde meetingbar

Created at 2 weeks ago

control-service: fix job builder unit tests (#792)

The unit tests have been failed due to the wrong expected type. They expect String but the value is NULL.

Testing done: unit tests

Signed-off-by: Miroslav Ivanov miroslavi@vmware.com

versatile-data-kit: Update CONTRIBUTING.md with links to coding standard (#794)

  • versatile-data-kit: Update CONTRIBUTING.md with links to a coding standard

The coding standard is not easy to find. And it's generally recommended that it's included in contributing.md

Signed-off-by: Antoni Ivanov aivanov@vmware.com

airflow-provider-vdk: Start and cancel job execution methods (#778)

This change allows job executions to be started and cancelled through the VDK connection hook object.

Testing done: included two unit tests, tested locally against deployed control-service

Signed-off-by: Gabriel Georgiev gageorgiev@vmware.com

airflow-provider-vdk: Job execution status and logs method (#796)

This change implements a method for the VDK connection hook which allows a user to check the status of a job execution in the VDK control-service. It also implements a method which allows a user to download the stored job execution logs for a particular execution.

Testing done: locally tested and verified that the correct object and correct log string are returned, implemented 2 unit tests

Signed-off-by: Gabriel Georgiev gageorgiev@vmware.com

[pre-commit.ci] pre-commit autoupdate (#799)

updates:

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

VEP-554: Extend Security and Permissions section (#786)

  • VEP-554: Extend Security and Permissions section

As part of the development of the Versatile Data Kit Airflow Provider, we need to ensure that all connections to the Control Service are properly authenticated. For this to happen, we need to define the authentication mechanisms and flows that would be used.

This change extends the Security and Permissions section of the enhancement proposal to include more details on how authentication would be facilitated in the Airflow Provider.

Testing Done: Documentation change, no testing needed.

Signed-off-by: Andon Andonov andonova@vmware.com

Co-authored-by: Gabriel Georgiev 45939426+gageorgiev@users.noreply.github.com

vdk-plugins: Introduce vdk-control-api-auth plugin (#801)

With the increase in plugins requiring authentication arises the need for a dedicated utility that provides such functions.

This change introduces the initial structure of a library plugin, whose purpose is to facilitate authentication with 3-rd party services.

Testing Done: None

Signed-off-by: Andon Andonov andonova@vmware.com

airflow-provider-vdk: VDKSensor initial structure (#800)

This change lays out the initial structure for the VDK asynchronous operator, a.k.a. the VDKSensor. The purpose of this sensor is to routinely poke the VDK control-service and check the status of a particular job, so that its completion/failure can trigger some activity in an Airflow DAG, i.e. trigger another job.

Signed-off-by: Gabriel Georgiev gageorgiev@vmware.com

vdk-lineage: introducing POC (pre-alpha) implementation (#783)

VDK Lineage plugin provides lineage data (input data -> job -> output data) information and send it to pre-configured destination.

At POC level currently (Pre-Alpha release). It collects lineage information for each job run and for each executed query. Query execution is currently before it's executed (so not query status is logged) as we lack the necessary hooks.

There are plenty of TODOs in the code that need to be addressed before moving maturity level and add more tests.

Testing Done: there are some tests and also manually:

vdk marquez-server --start
vdk run some-job
vdk marquez-server --stop

Signed-off-by: Antoni Ivanov aivanov@vmware.com

vdk-core: minor refactoring in managed_cursor to reduce long method (#803)

Testing Done: unit tests

Signed-off-by: Antoni Ivanov aivanov@vmware.com

vdk-core: print query duration (#804)

why: Users requested clearer examples on how long given operations took: "- In concrete case problems was with SQL query , it would be nice if we can print profile details (especially if query fails)"

what: Added new log statements that show how long an SQL query took. Log format is 'Recovered query duration 00h:00m:09s'

testing: Added unit tests.

Signed-off-by: Momchil Zhivkov mzhivkov@vmware.com

vdk-core: Verify payload after pre-processing it (#777)

  • vdk-core: Verify payload after pre-processing it

Ingestion allows pre-processing payloads. If the payload is curated during pre-processing, currently, verification would fail. Ingestion pre/post processors are initialized using predefined config vars. These vars should have VDK_ prefix which is not added automatically at this point.

Verify payload after pre-processing it, since this pre-processing might be responsible for making it serializable. Add VDK_ prefix to configs so that ingestion pre/post processors are properly initialized.

Tested by: locally run ingestion job which curates json during pre-processing

Signed-off-by: Yana Zhivkova yzhivkova@vmware.com

vdk-core: Allow logs to be sent to an endpoint using SysLog (#807)

This change makes it so an cloud executions of jobs can have an endpoint configured, where logs will be sent automatically using SysLog.

Testing done: ran job with a configured endpoint and observed logs in the endpoint

Signed-off-by: Gabriel Georgiev gageorgiev@vmware.com

versatile-data-kit: release 0.3 (#810)

Update minor versions for new release

Signed-off-by: Antoni Ivanov aivanov@vmware.com

versatile-data-kit: link examples wiki in the git examples (#812)

  • versatile-data-kit: link examples tutorials in the git examples directory

The examples directory in git contains generally the source code. And the examples tutorial/articles are written in a wiki currently.

Signed-off-by: Antoni Ivanov's avatarAntoni Ivanov aivanov@vmware.com

versatile-data-kit: update readme with clear slack instructions (#806)

  • versatile-data-kit: update readme with clear slack instructions

Add a link to the channel so that users can join directly easier. Going to CNCF slack main page takes a lot of time.

Signed-off-by: Antoni Ivanov aivanov@vmware.com

vdk-heartbeat: null datetime conversion fix (#813)

  • vdk-heartbeat: null datetime conversion fix

We've noticed an occasional error occurs, applying the stack trace: File "/opt/buildenv/lib/python3.7/site-packages/vdk/internal/heartbeat/ job_controller.py", line 383, in check_job_execution_finished self._datetime_from_iso_format(str(e["end_time"])) for e in execution_list TypeError: '>' not supported between instances of 'NoneType' and 'datetime.datetime'

The error seems to reproduce in case execution end_time is not yet populated, and max() is attempting to compare datetime with NoneType. The fix presumes the status of latest non-populated end_time is freshest, so returns that if present. Otherwise, returns status of latest end datetime populated. Discussion on workflow is found at https://github.com/vmware/versatile-data-kit/pull/813 #discussion_r854317775 A broader Exception is added for handling during datetime conversion, like a preventive step.

Testing done: did retrieve the Execution API response, did reproduce same TypeError error via local code snippet, then verified the newly introduced fix applied is operational. Did add a unit test to cover this use-case.

Signed-off-by: ikoleva ikoleva@vmware.com

vdk-core: new version check built-in plugin false positive fix (#816)

A bug detected when using custom new-version-check-plugin configurations:

config_builder.set_value(key="PACKAGE_INDEX",value="https://some.repo")
config_builder.set_value(key="PACKAGE_NAME",
value="some-package-upgraded-to-latest-locally")
config_builder.set_value("VERSION_CHECK_PLUGINS", False)

That results in false-positive yet package-empty upgrade message displayed:

************************************************************************
New version for   is available.
Please update to latest version by using:
 pip install --upgrade-strategy eager -U   --extra-index-url
 https://some.repo
************************************************************************

Changed _check_version method to _print_new_version_message, matching the purpose described in its pydocs. Added handling of use-case when no packages for upgrade detected.

Testing Done: added a test_no_new_version_check to cover that use-case, the test detects the issue before the changes and succeeds after applying the changes made.

Signed-off-by: ikoleva ikoleva@vmware.com

vdk-core: encapsulate router-specific properties logic (#817)

  • vdk-core: encapsulate router-specific properties logic

This MR attempts to simplify outer usages of PropertiesRouter, in equivalent to IngesterRouter. The reason is IngesterRouter does support multiple ingesters evaluated in a sequence, and PropertiesRouter supports singleton properties back-end implementation yet.

PropertiesRouter interface surface no more exposes a singleton implementation, instead implements the IProperties interface and does the routing internally. Also IngesterRouter already implements IIngester methods, so elaborated the class definition on that explicitly, so it is effectively coupled with the interfaces evolution.

Testing Done: ci/cd. No functional changes made, except interface surface improvements and encapsulation.

Signed-off-by: ikoleva ikoleva@vmware.com

pipelines-control-service: new release (#814)

We need to release latest bundle including patched vdk-heartbeat version >=0.6.520819681.

Increment the version in version.txt file to trigger the release.

Testing Done: n/a

Signed-off-by: ikoleva ikoleva@vmware.com

Created at 2 weeks ago