Skip to content

Pipelines: Real-World PyRun Examples ๐Ÿงช โ€‹

PyRun Pipelines are ready-to-run, end-to-end examples demonstrating practical applications of PyRun with frameworks like Lithops and Dask. Instead of building from scratch, launch a pipeline to see PyRun solve a real-world problem, then explore the underlying code and configuration. Pipelines are excellent for learning best practices and discovering PyRun's capabilities.

What are PyRun Pipelines? โ€‹

Each pipeline represents a complete, pre-configured project:

  • Defined Use Case: Addresses a specific task (e.g., performance benchmarking, scientific data analysis, hyperparameter tuning).
  • Pre-configured Runtime: Includes all necessary Python packages and dependencies, automatically built by PyRun.
  • Ready-to-Run Code: Contains the complete script(s) needed for the task.
  • Data Integration: May include sample data or connect to public data sources.
  • Automated Execution: PyRun handles launching the job on your AWS account and directing you to monitoring.
  • Code Access (Post-Execution): After running, you can typically access the workspace containing the pipeline's code to understand how it works.

Available Pipelines โ€‹

Here's a selection of available pipelines:

NameDescriptionFramework UsedKey Concepts ShownData FormatEst. Complexity
Audio RecognitionAI pipeline for audio keyword recognition using TensorFlow.TensorFlow, NumPy, MatplotlibAudio recognition, Automatic Speech Recognition (ASR), Spectrogram generation (STFT), CNN, TensorFlow Datasets.WAV, Spectrogram (derived), TensorFlow DatasetMedium
CMIP6 ECS Estimation (Gregory Method)Pipeline applying the Gregory method to CMIP6 climate model data to estimate Equilibrium Climate Sensitivity (ECS).DaskClimate change model analysis (CMIP6), parallel cloud computing, Gregory method, time series analysisCSV, ZarrHigh
CMIP6 Precipitation Frequency ChangeDask pipeline analyzing CMIP6 climate model data to quantify changes in precipitation frequency.DaskClimate change model analysis (CMIP6), parallel cloud computing, scientific Data Analysis (xarray)CSV, ZarrHigh
DaskDescription for template goes hereDask (inferred)N/AN/AN/A
Dask Machine Learning ExampleDemonstrates distributed machine learning using Dask-ML.Dask, Dask-ML, Scikit-learn, Joblib, PandasDistributed machine learning, Hyperparameter tuning (GridSearchCV), Clustering (KMeans).NumPy arrays, Dask arraysMedium
FLOPS benchmarkMeasures floating-point computation performance using parallel functions.LithopsBasic map, performance measurementIn-memory tuplesLow
Hyperparameter tuningPerforms grid search hyperparameter tuning for a text classification model on Amazon reviews data.LithopsML preprocessing, Grid search, parallel model evalCompressed textMedium
Image ClassificationAI pipeline for image classification using TensorFlow.TensorFlow, Keras, Matplotlib, NumPy, PILImage classification, CNN, TensorFlow, Data Augmentation, Overfitting mitigation.JPG, TensorFlow DatasetMedium to High
KerchunkGenerate Kerchunk references from remote NetCDF files, enabling virtual dataset access via xarray.DaskParallel Metadata Generation, dataset Aggregation, scientific Data Analysis (xarray)NetCDF, Zarr (virtual)Medium
LithopsDescription for template goes hereLithops (inferred)N/AN/AN/A
Lithops TeraSort BenchmarkServerless pipeline to generate and sort large synthetic datasets using TeraGen and TeraSort over Lithops. Executes a distributed MapReduce-style sort across cloud functions.Lithops, Polars, NumPyServerless parallel data generation, distributed sorting, multipart upload to object storage, map-reduce orchestration, warm-up strategy, function runtime tuning.Plain text (ASCII) or binary records; output stored in object storage (e.g. S3); optional multipart filesMedium
LLM Execution with OllamaAI Interactive notebook to run Large Language Models (LLMs) locally using Ollama.OllamaLocal LLM execution, Ollama integration, Interactive prompting, Streaming API, Model management.Text (user prompts, LLM responses)Low to Medium
Mandelbrot classicGenerates the Mandelbrot set fractal using parallel computation across a defined space.LithopsParallel computation, image generation (map)Numerical GridMedium
METASPACE Metabolite annotation pipelineThis pipeline showcases metabolite annotation of a imzML dataset.LithopsMulti-step workflow, complex data handling, imzMLimzML, DatabasesHigh
Model calculationThis notebook contains a model calculation process that consumes laz files.Python 3.10 (from language)LIDAR tiles processing, .laz format (from dataSet, dataFormat).lazN/A
NDVI Temporal Change Detection PipelineServerless pipeline to compute NDVI and its change over time using Sentinel-2 imagery.Lithops, Rasterio, MatplotlibNDVI calculation, temporal analysis, geospatial image tiling, parallel processing, change detection.GeoTIFF (S2 bands), JPG (NDVI visualization), S3 pathsMedium to high
Vorticity Workload with CubedServerless pipeline for computing and analyzing vorticity using synthetic oceanographic velocity fields with Cubed.Cubed, Xarray, Zarr, NumPyLazy computation, first-order derivatives, chunked ND-array processing, map-overlap optimization, Zarr storageZarr (input/output), NumPy arraysMedium
Water consumptionServerless pipeline that estimates evapotranspiration using DTMs and meteorological data.Lithops, GRASS GIS, RasterioDistributed geospatial processing, IDW interpolation, solar radiation modeling, evapotranspiration.GeoTIFF, CSV (SIAM), ZIP (shapefile)Medium to high

Running a Pipeline โ€‹

  1. Navigate to Pipelines: Find "Pipelines" in the PyRun sidebar.

  2. Browse: Review the available pipelines and their descriptions.

  3. Select & Launch: Click the "Run" button for the desired pipeline.

  4. Monitor: PyRun automatically initiates the execution on your AWS account and shows you the Real-Time Monitoring page for that job. Watch the logs and metrics.

    Pipeline Selection UI

Exploring Pipeline Code โ€‹

  • Access Workspace: Once a pipeline execution is complete (or sometimes even while running, depending on implementation), PyRun usually provides a link or way to open the associated Workspace.
  • Examine Files: Inside the workspace, you'll find the Python scripts (.py), runtime definition (.pyrun/environment.yml), and any other configuration files used by the pipeline.
  • Learn: Study how the code utilizes Lithops or Dask, how data is handled, and how the workflow is structured. This is a great way to learn practical techniques.

Benefits of Using Pipelines โ€‹

  • Learn by Doing: See concrete examples of how PyRun works for real tasks.
  • Quick Evaluation: Assess PyRun's suitability for different types of problems without initial coding effort.
  • Best Practices: Observe effective patterns for structuring cloud-based Python applications.
  • Code Templates: Adapt code snippets or entire structures from pipelines for your own projects.
  • Discover Features: See integrations like Data Cockpit or specific framework features in action.

More Pipelines Coming! โ€‹

We are continuously developing and adding new pipelines covering more domains (like advanced ML, geospatial analysis, genomics) and showcasing more PyRun features and framework integrations. Check back often!

Start exploring PyRun Pipelines today and accelerate your journey into effortless cloud computing!