Tools exposed by the OpenHEXA MCP server
OpenHEXA v0.0.1 · Protocol 2025-03-26 · 25 tools
InstallList connections (external data sources) configured in a workspace. Returns connection name, slug, type (S3, GCS, POSTGRESQL, DHIS2, IASO, CUSTOM), and their fields. Field values are not visible to not be tempted to use them directly. Connections are used as parameters when running pipelines — use the connection slug as the parameter value.
List datasets in a workspace. Returns dataset summaries. Use get_dataset with the dataset slug to get full details including versions and files.
Get full details of a dataset: metadata, permissions, all versions with their files, and the latest version's file list. Use a file 'id' from the response with preview_dataset_file to see sample data. Use the dataset 'id' with create_dataset_version to add a new version.
Preview the content of a dataset file by its ID (from get_dataset's file list). Returns a sample of the data for tabular files (CSV, Parquet, etc.), file properties, and metadata. The sample status can be PROCESSING (still generating), FINISHED (sample ready), or FAILED.
Create a new dataset in a workspace with an initial version (v1) containing the provided files. The files_json parameter is a JSON array of {uri, contentType, content} objects, e.g. '[{"uri": "data.csv", "contentType": "text/csv", "content": "a,b\n1,2"}]'. Use create_dataset_version to add more versions later.
Create a new version of a dataset with optional inline files. Requires the dataset ID (from get_dataset or create_dataset) and a version name (e.g. 'v1', '2024-01'). Optionally provide a changelog describing what changed. To include files, provide files_json as a JSON array of {uri, contentType, content} objects, e.g. '[{"uri": "data.csv", "contentType": "text/csv", "content": "a,b\n1,2"}]'.
List files and directories in a workspace bucket. Use prefix to browse subdirectories (e.g. 'data/'). Returns file name, path, type (file/directory), size, and last update. Use read_file with the file path to read text content.
Read the content of a text file from a workspace bucket. Only works for UTF-8 text files up to 1 MB (includes .py, .csv, .json, .ipynb, .sql, .txt, etc.). Use list_files first to check file size and path. For Jupyter notebooks (.ipynb), the content is JSON that you can parse to read/modify cells. Use start_line and end_line (1-indexed, inclusive) to read a specific range of lines instead of the entire file.
Write text content to a new file in a workspace bucket. Fails if the file already exists. Maximum 1 MB. For Jupyter notebooks, provide valid .ipynb JSON content. Requires createObject permission on the workspace.
Call this tool when you are stuck, unsure what to do next, or need guidance on OpenHEXA. Provide a reason describing why you need help (e.g. 'unsure which tool to use', 'pipeline failed', 'cannot find dataset').
List pipelines in a workspace. Returns pipeline summaries (id, code, name, description). Use get_pipeline with the pipeline code to get full details including source code and run history.
Get full details of a pipeline: metadata, schedule, permissions, current version source code with all files, parameters, and recent runs. Use the returned 'id' field when calling run_pipeline or update_pipeline. Use a run 'id' from the runs list with get_pipeline_run to inspect outputs and logs.
Get detailed information about a specific pipeline run. Returns status, configuration used, messages (warnings/errors), outputs (files, database tables), and execution logs. Use this after run_pipeline to check results, or to inspect any run from get_pipeline's runs list.
Run a pipeline. Requires the pipeline UUID (from get_pipeline's 'id' field) and a JSON config string mapping parameter codes to values. Check the pipeline's parameters with get_pipeline first to see required parameters and their types. Example config: '{"param1": "value1", "param2": 42}'. Returns the created run's ID — use get_pipeline_run to monitor progress and get results.
Update a pipeline's properties. Provide the pipeline UUID (from get_pipeline's 'id' field) and any fields to change. For schedule, use a CRON expression (minute hour day-of-month month day-of-week), e.g. '0 6 * * 1' for Mondays at 6AM, '0 */2 * * *' for every 2 hours. Pass schedule='none' to disable scheduling. Only provided non-empty fields are updated.
Create a new pipeline in the current workspace. Optionally upload Python source code as the first version (v1). Always provide a meaningful description summarizing what the pipeline does. If the pipeline has no clear purpose or is blank, use "" as the description. Only name, description, and functional_type are supported at creation time. If source_code is omitted, the pipeline is created without any version. The source_code must follow this structure: from openhexa.sdk import current_run, pipeline @pipeline("Simple ETL") def simple_etl(): count = task_1() task_2(count) @simple_etl.task def task_1(): current_run.log_info("In task 1...") return 42 @simple_etl.task def task_2(count): current_run.log_info(f"In task 2... count is {count}") if __name__ == "__main__": simple_etl()
Upload a new version of an existing pipeline. Requires the workspace slug, the pipeline code (from get_pipeline), and the Python source code for the new version. Optionally provide a version name and description. The version number is auto-incremented. The source_code must follow the OpenHEXA SDK structure (use @pipeline and @task decorators). Use get_pipeline first to read the current source code, then modify it and pass it here. Returns the created version details including id, version number, and parsed parameters.
List available pipeline templates. Optionally filter by search query. Templates are reusable pipeline blueprints. Workflow: list_pipeline_templates -> get_pipeline_template (to review code) -> create_pipeline_from_template (to instantiate in a workspace).
Get full details of a pipeline template including its description, config, version history, and the current version's source code and parameters. Use the currentVersion.id as the template_version_id when calling create_pipeline_from_template.
Create a new pipeline in a workspace from a template version. Use get_pipeline_template first to find the template_version_id (the currentVersion.id). The new pipeline will have the template's code, parameters, and configuration pre-configured.
List static web apps in a workspace. The returned URL can be used to access each webapp in a browser.
Create a static web app in a workspace. Provide files_json as a JSON array of {path, content} objects, e.g. '[{"path": "index.html", "content": "<html>...</html>"}, {"path": "style.css", "content": "body { ... }"}]'. An index.html file is required at minimum. Returns the webapp URL to access it in a browser.
Update an existing static web app. Provide the webapp UUID (from list_static_webapps) and any fields to change. To update files, provide files_json as a JSON array of {path, content} objects — this replaces all files. Only provided non-empty fields are updated.
List workspaces accessible to the current user. Optionally filter by name using the query parameter. This is typically the first tool to call to discover available workspaces before accessing pipelines, datasets, or files.
Get details of a specific workspace by its slug. Returns workspace metadata and permissions. Use list_workspaces first if you don't know the slug.