Airbyte primarily follows the ELT (Extract, Load, Transform) paradigm.
This approach allows for greater flexibility and scalability compared to traditional ETL processes. By loading raw data first, Airbyte enables data teams to perform transformations within the target data warehouse, leveraging its computational power and SQL capabilities.
Airbyte serves as a crucial component in modern data stacks, facilitating:
1. Data consolidation from disparate sources
2. Real-time data replication
3. Building data lakes and warehouses
4. Enabling data-driven decision making
For instance, a e-commerce company might use Airbyte to sync customer data from their CRM, transaction data from their payment processor, and inventory data from their ERP system into a central data warehouse for unified analytics.
While both tools are essential in the modern data stack, they serve distinct purposes:
Airbyte focuses on the 'EL' part of ELT, extracting and loading raw data from various sources to destinations.
DBT, on the other hand, specializes in the 'T' - transformation. It works within your data warehouse to transform raw data into analytics-ready datasets.
In a typical workflow, Airbyte would first sync raw data to a warehouse, then DBT would transform that data into usable models for analysis.
Despite its strengths, Airbyte has some limitations:
1. The open-source version lacks advanced features like role-based access control.
2. Some users report performance issues with very large data volumes.
3. The community-driven nature of many connectors can lead to varying levels of reliability.
4. Complex transformations may require additional tools or custom coding.
Shakudo seamlessly incorporates Airbyte into its managed data platform. We handle the deployment, scaling, and maintenance of Airbyte, allowing your team to focus on data strategy rather than infrastructure.
Our integration ensures that Airbyte works harmoniously with other components of your data stack, providing a unified experience for data ingestion, transformation, and analysis. This approach exemplifies Shakudo's commitment to offering best-of-breed tools while abstracting away the operational complexities.
<aside>📌 Command-first runbook for customer deployment calls. Replace placeholders before running. For production environments, run changes through the customer-approved change process.
</aside>
Run:
export KUBECONFIG=/path/to/customer-kubeconfig
export KUBE_CONTEXT=<customer-context>
kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" config current-context
kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" get nodes
kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" get namespace hyperplane-airbyte || kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" create namespace hyperplane-airbyte
Run:
git clone --depth=1 --branch <release-branch> <https://github.com/devsentient/monorepo.git> /tmp/monorepo
cd /tmp/monorepo/stack-components/airbyte/helm
helm dependency update .
Run:
kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" create secret generic airbyte-secrets -n hyperplane-airbyte --from-literal=DATABASE_PASSWORD='<postgres-password>' --from-literal=AIRBYTE_SECRET_PERSISTENCE='KUBERNETES_SECRETS'
Run:
cat > /tmp/airbyte-values.yaml <<'EOF_VALUES'
global:
deploymentMode: oss
airbyte:
webapp:
enabled: true
worker:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 4Gi
externalDatabase:
enabled: true
host: <postgres-host>
database: airbyte
user: airbyte
secrets:
existingSecret: airbyte-secrets
ingress:
enabled: true
host: airbyte.<customer-domain>
EOF_VALUES
Run:
helm --kubeconfig "$KUBECONFIG" --kube-context "$KUBE_CONTEXT" upgrade --install airbyte /tmp/monorepo/stack-components/airbyte/helm \\
--namespace hyperplane-airbyte \\
--create-namespace \\
--values /tmp/airbyte-values.yaml \\
--timeout 15m \\
--wait
Run:
helm --kubeconfig "$KUBECONFIG" --kube-context "$KUBE_CONTEXT" status airbyte -n hyperplane-airbyte
kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" get pods,svc,pvc,ingress,virtualservice -n hyperplane-airbyte
kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" get events -n hyperplane-airbyte --sort-by=.lastTimestamp | tail -n 60
kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" logs -n hyperplane-airbyte -l app.kubernetes.io/instance=airbyte --tail=100
Run:
kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" port-forward -n hyperplane-airbyte svc/airbyte-webapp-svc 8000:80
# In another terminal:
curl -I <http://localhost:8000>
Run:
helm --kubeconfig "$KUBECONFIG" --kube-context "$KUBE_CONTEXT" history airbyte -n hyperplane-airbyte
helm --kubeconfig "$KUBECONFIG" --kube-context "$KUBE_CONTEXT" rollback airbyte <REVISION> -n hyperplane-airbyte
kubectl --kubeconfig "$KUBECONFIG" --context "$KUBE_CONTEXT" rollout status deployment/airbyte -n hyperplane-airbyte || true