Stately Stately Arrow Plugin

The arrow plugin provides data connectivity and SQL query execution capabilities using Apache Arrow and DataFusion. Connect to various data sources and run queries with streaming results.

Features

  • Multiple Backends: S3, GCS, Azure, ClickHouse, local filesystem
  • SQL Queries: Execute queries via DataFusion
  • Streaming Results: Arrow IPC streaming for large datasets
  • Connector Registry: Manage and register data sources
  • Schema Discovery: Browse catalogs, databases, and tables

Installation

Backend

cargo add stately-arrow

Feature flags for backends:

cargo add stately-arrow --features object-store,database,clickhouse

Frontend

npm
yarn
pnpm
bun
deno
npm install @statelyjs/arrow

Quick Start

Backend Setup

use stately_arrow::{api, QueryContext, QueryState, ConnectorRegistry};
use std::sync::Arc;

// Create connector registry
let registry: Arc<dyn ConnectorRegistry> = create_your_registry();

// Create query context
let query_context = QueryContext::new(registry);

// State extraction
impl FromRef<ApiState> for QueryState {
    fn from_ref(state: &ApiState) -> Self {
        QueryState::new(state.query_context.clone())
    }
}

// Add to router
pub fn app(state: ApiState) -> Router {
    Router::new()
        .nest("/arrow", api::router(state.clone()))
        .with_state(state)
}

Frontend Setup

import { statelyUi, statelyUiProvider, useStatelyUi } from '@statelyjs/stately';
import { type DefineConfig, type Schemas, stately } from '@statelyjs/stately/schema';
import { type ArrowPlugin, type ArrowUiPlugin, arrowPlugin, arrowUiPlugin } from '@statelyjs/arrow';

import openApiSpec from '../../openapi.json';
import { PARSED_SCHEMAS, type ParsedSchema } from '../generated/schemas';
import type { components, operations, paths } from '../generated/types';

// Define app schema with plugin extensions 
type AppSchemas = Schemas<
  DefineConfig<components, paths, operations, ParsedSchema>,
  readonly [ArrowPlugin]
>;

const schema = stately<AppSchemas>(openApiSpec, PARSED_SCHEMAS)
  .withPlugin(arrowPlugin());

const runtime = statelyUi<AppSchemas, readonly [ArrowUiPlugin]>({ client, schema, core, options })
  .withPlugin(arrowUiPlugin({ api: { pathPrefix: '/api/data' } }));

API Endpoints

MethodPathDescription
GET/connectorsList available connectors
POST/connectorsGet details for multiple connectors
GET/connectors/{id}List connector contents
GET/registerList registered connections
GET/register/{id}Register connector with DataFusion
GET/catalogsList DataFusion catalogs
POST/queryExecute SQL query (streaming response)

Backend Connectors

Object Store

Connect to cloud object stores:

use stately_arrow::object_store::{Config, ObjectStore};

let config = Config {
    format: ObjectStoreFormat::Parquet(None),
    store: ObjectStore::S3 {
        bucket: "my-bucket".into(),
        region: Some("us-east-1".into()),
        // Credentials from environment
    },
};

Supported stores:

  • Amazon S3
  • Google Cloud Storage
  • Azure Blob Storage
  • Local filesystem

ClickHouse

Connect to ClickHouse databases (uses clickhouse-datafusion under the hood):

use stately_arrow::database::clickhouse::Config;

let config = Config {
    options: ConnectionOptions {
        endpoint: "http://localhost:8123".into(),
        username: Some("default".into()),
        password: None,
        tls: false,
    },
    // ...
};

Streaming Queries

Queries return results as streaming Arrow IPC:

// Backend streams RecordBatches
POST /query
Content-Type: application/json
{ "sql": "SELECT * FROM my_table LIMIT 1000", "connector_id": "my-connector" }

// Response: application/vnd.apache.arrow.stream

The frontend handles streaming automatically:

import { useStreamingQuery } from '@statelyjs/arrow/hooks';

function QueryView() {
  const { execute, data, isStreaming, stats } = useStreamingQuery();
  
  const runQuery = () => {
    execute({
      sql: 'SELECT * FROM my_table',
      connectorId: 'my-connector',
    });
  };
  
  return (
    <div>
      <button onClick={runQuery}>Run Query</button>
      {isStreaming && <p>Streaming... {stats.rowCount} rows</p>}
      <ArrowTable data={data} />
    </div>
  );
}

Development is underway to support more sophisticated pagination and streaming of massive datasets. Check back in for updates.

Frontend Components

ArrowViewer

Full-featured data exploration and query page:

import { ArrowViewer } from '@statelyjs/arrow/pages';

function DataPage() {
  return <ArrowViewer />;
}

QueryEditor

SQL editor with syntax highlighting:

import { QueryEditor } from '@statelyjs/arrow/components';

function MyComponent() {
  return (
    <QueryEditor
      value={sql}
      onChange={setSql}
      onExecute={handleExecute}
    />
  );
}

ArrowTable

High-performance data table:

import { ArrowTable } from '@statelyjs/arrow/components';

function Results({ data }) {
  return <ArrowTable data={data} />;
}

ConnectorMenuCard

Connector browser with schema navigation:

import { ConnectorMenuCard } from '@statelyjs/arrow/views';

function Sidebar() {
  return (
    <ConnectorMenuCard
      onTableSelect={(table) => insertIntoQuery(table)}
    />
  );
}

Hooks

import {
  useStreamingQuery,
  useConnectors,
  useListConnectors,
  useRegisterConnection,
  useListCatalogs,
} from '@statelyjs/arrow/hooks';

function MyComponent() {
  const { connectors } = useListConnectors();
  const { register } = useRegisterConnection();
  const { execute, data, stats } = useStreamingQuery();
  
  // ...
}

The Backend Trait

Implement custom backends by implementing the Backend trait:

#[async_trait]
pub trait Backend: Send + Sync {
    fn connection(&self) -> &ConnectionMetadata;
    
    async fn prepare_session(&self, session: &SessionContext) -> Result<()>;
    
    async fn list(&self, database: Option<&str>) -> Result<ListSummary>;
}