Skip to content

Pipelines: Real-World PyRun Examples ๐Ÿงช โ€‹

PyRun Pipelines are ready-to-run, end-to-end examples demonstrating practical applications of PyRun with frameworks like Lithops and Dask. Instead of building from scratch, launch a pipeline to see PyRun solve a real-world problem, then explore the underlying code and configuration. Pipelines are excellent for learning best practices and discovering PyRun's capabilities.

What are PyRun Pipelines? โ€‹

Each pipeline represents a complete, pre-configured project:

  • Defined Use Case: Addresses a specific task (e.g., performance benchmarking, scientific data analysis, hyperparameter tuning).
  • Pre-configured Runtime: Includes all necessary Python packages and dependencies, automatically built by PyRun.
  • Ready-to-Run Code: Contains the complete script(s) needed for the task.
  • Data Integration: May include sample data or connect to public data sources.
  • Automated Execution: PyRun handles launching the job on your AWS account and directing you to monitoring.
  • Code Access (Post-Execution): After running, you can typically access the workspace containing the pipeline's code to understand how it works.

Available Pipelines โ€‹

Here's a selection of available pipelines:

NameDescriptionFramework UsedKey Concepts ShownData FormatEst. Complexity
FLOPS benchmarkMeasures floating-point computation performance using parallel functions.LithopsBasic map, performance measurementIn-memoryLow
METASPACE AnnotationMetabolite annotation pipeline for mass spectrometry imaging data (imzML). Involves multiple processing stages.LithopsMulti-step workflow, complex data handling, imzMLimzML, DatabasesHigh
Mandelbrot ClassicGenerates the Mandelbrot set fractal using parallel computation across a defined space.LithopsParallel computation, image generation (map)Numerical GridMedium
Hyperparameter TuningPerforms grid search hyperparameter tuning for a text classification model on Amazon reviews data.LithopsML preprocessing, Grid search, parallel model evalCompressed TextMedium

Running a Pipeline โ€‹

  1. Navigate to Pipelines: Find "Pipelines" in the PyRun sidebar.

  2. Browse: Review the available pipelines and their descriptions.

  3. Select & Launch: Click the "Run" button for the desired pipeline.

  4. Monitor: PyRun automatically initiates the execution on your AWS account and shows you the Real-Time Monitoring page for that job. Watch the logs and metrics.

    Pipeline Selection UI

Exploring Pipeline Code โ€‹

  • Access Workspace: Once a pipeline execution is complete (or sometimes even while running, depending on implementation), PyRun usually provides a link or way to open the associated Workspace.
  • Examine Files: Inside the workspace, you'll find the Python scripts (.py), runtime definition (.pyrun/environment.yml), and any other configuration files used by the pipeline.
  • Learn: Study how the code utilizes Lithops or Dask, how data is handled, and how the workflow is structured. This is a great way to learn practical techniques.

Benefits of Using Pipelines โ€‹

  • Learn by Doing: See concrete examples of how PyRun works for real tasks.
  • Quick Evaluation: Assess PyRun's suitability for different types of problems without initial coding effort.
  • Best Practices: Observe effective patterns for structuring cloud-based Python applications.
  • Code Templates: Adapt code snippets or entire structures from pipelines for your own projects.
  • Discover Features: See integrations like Data Cockpit or specific framework features in action.

More Pipelines Coming! โ€‹

We are continuously developing and adding new pipelines covering more domains (like advanced ML, geospatial analysis, genomics) and showcasing more PyRun features and framework integrations. Check back often!

Start exploring PyRun Pipelines today and accelerate your journey into effortless cloud computing!