GitHub - mabel-dev/orso: ๐ป Orso is a row-based Python DataFrame library
Overview
Orso is not intended to compete with Polars or Pandas (or your favorite bear DataFrame technology), instead it is developed as a common layer for Mabel and Opteryx.
Key Use Cases:
- In Opteryx, Orso provides most of the database Cursor functionality
- In Mabel, Orso provides the data schema and validation functionality
Orso DataFrames are row-based, driven by their initial target use-case as the WAL for Mabel and Cursor for Opteryx. Each row in an Orso DataFrame can be quickly converted to a Tuple of values, a Dictionary, or a byte representation.
Installation
Install Orso from PyPI:
Quick Start
Creating a DataFrame
import orso # Create from list of dictionaries df = orso.DataFrame([ {'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25, 'city': 'San Francisco'}, {'name': 'Charlie', 'age': 35, 'city': 'Chicago'} ]) print(f"Created DataFrame with {df.rowcount} rows and {df.columncount} columns")
Displaying Data
# Display the DataFrame print(df.display()) # Convert to different formats arrow_table = df.arrow() # PyArrow Table pandas_df = df.pandas() # Pandas DataFrame
Working with Schema
# Access column names print("Columns:", df.column_names) # Access schema information print("Schema:", df.schema)
Converting Between Formats
# From PyArrow import pyarrow as pa arrow_table = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']}) orso_df = orso.DataFrame.from_arrow(arrow_table) # To Pandas pandas_df = orso_df.pandas()
Features
- Lightweight: Minimal overhead for tabular data operations
- Row-based: Optimized for row-oriented operations
- Interoperable: Easy conversion to/from PyArrow, Pandas
- Schema-aware: Built-in data validation and type checking
- Fast serialization: Efficient conversion to bytes, tuples, and dictionaries
API Reference
DataFrame Class
The main DataFrame class provides the following key methods:
DataFrame(dictionaries=None, *, rows=None, schema=None)- Constructordisplay(limit=5, colorize=True, show_types=True)- Pretty print the DataFramearrow(size=None)- Convert to PyArrow Tablepandas(size=None)- Convert to Pandas DataFramefrom_arrow(tables)- Create DataFrame from PyArrow Table(s)fetchall()- Get all rows as list of Row objectscollect()- Materialize the DataFrameappend(other)- Append another DataFramedistinct()- Get unique rows
Properties
rowcount- Number of rowscolumncount- Number of columnscolumn_names- List of column namesschema- Schema information
Development
Building from Source
# Clone the repository git clone https://github.com/mabel-dev/orso.git cd orso # Install dependencies pip install -r requirements.txt pip install -r tests/requirements.txt # Build Cython extensions make compile # Run tests make test
Contributing
Orso is part of the Mabel ecosystem. Contributions are welcome! Please ensure:
- All tests pass:
make test - Code follows the project style:
make lint - New features include appropriate tests
- Documentation is updated for API changes
Performance Benchmarking
Orso includes a comprehensive performance benchmark suite to compare different versions:
# Run full benchmark suite python tests/test_benchmark_suite.py # Compare two versions python tests/test_benchmark_suite.py -o baseline.json # <switch version> python tests/test_benchmark_suite.py -o current.json -c baseline.json
See BENCHMARK_SUITE.md for detailed documentation.
License
Orso is licensed under Apache 2.0 unless explicitly indicated otherwise.
Status
Orso is in beta. Beta means different things to different people, to us, being beta means:
- Interfaces are generally stable but may still have breaking changes
- Unit tests are not reliable enough to capture breaks to functionality
- Bugs are likely to exist in edge cases
- Code may not be tuned for performance
As such, we really don't recommend using Orso in critical applications.