◐ Shell
clean mode source ↗

feat: add fastCRW tool block by us · Pull Request #5025 · simstudioai/sim

Greptile Summary

This PR adds fastCRW as a new tool block (scrape / crawl / map / search), mirroring the existing Firecrawl block. The integration is additive-only: new files under tools/crw/ and blocks/blocks/crw.ts, plus registration in the block/tool registries, BYOK keys, CSP allowlist, icon, and integrations.json.

  • Four tool configs (crw_scrape, crw_search, crw_crawl, crw_map) mirror Firecrawl's structure with fastCRW-specific differences: maxPages instead of limit for crawl, a dynamic baseUrl param for self-hosting, and a resolveCrwBaseUrl helper.
  • Registration is complete across all required locations (BYOK schema, type union, CSP, icon mapping, integrations JSON), and a test file covers URL construction, body building, and response transformation for all four operations.

Confidence Score: 4/5

The change is purely additive and isolated to new files; no existing functionality is modified. The three tools with hardcoded success responses will silently swallow API-level errors, but they won't cause data corruption or affect other blocks.

Three of the four new tools (scrape, search, crawl) always return success: true from transformResponse even when the API body indicates failure — the crawl case is the worst because an undefined jobId leads the poll loop to request /v1/crawl/undefined, masking the real error. The fourth tool (map) handles this correctly, making the inconsistency self-contained within this PR. No other part of the codebase is touched.

apps/sim/tools/crw/scrape.ts, apps/sim/tools/crw/search.ts, apps/sim/tools/crw/crawl.ts — the transformResponse functions in all three need to check data.success before reporting a successful result.

Important Files Changed

Filename Overview
apps/sim/blocks/blocks/crw.ts New block config mirroring Firecrawl; routes scrape/search/crawl/map to the correct crw_* tools, formats params, and exposes baseUrl for self-hosting. Clean and consistent with existing block patterns.
apps/sim/tools/crw/scrape.ts Scrape tool is structurally correct but hardcodes success: true in transformResponse regardless of API-level errors, unlike map.ts which properly checks data.success.
apps/sim/tools/crw/search.ts Search tool also hardcodes success: true in transformResponse; same inconsistency with map.ts. Additionally, limit and sources params are used in the body builder but not declared in the tool's params definition (though this mirrors the Firecrawl search pattern).
apps/sim/tools/crw/crawl.ts Crawl tool implements async polling correctly, but transformResponse ignores data.success — if job creation returns HTTP 200 with success:false, postProcess will poll /v1/crawl/undefined leading to a confusing 404 error instead of the real failure.
apps/sim/tools/crw/map.ts Map tool correctly checks data.success in transformResponse and handles missing links with a fallback array. Well-structured and complete.
apps/sim/tools/crw/types.ts Comprehensive type definitions and output property constants. Clean mirror of the Firecrawl types, with appropriate additions for fastCRW-specific fields.
apps/sim/tools/crw/crw.test.ts Good coverage of URL construction, body building, and response transformation for all four operations. Tests document the expected API response shapes clearly.
apps/sim/lib/core/security/csp.ts Adds https://fastcrw.com to connect-src allowlist. Covers the full domain/origin, which is sufficient since the API lives at /api/v1/* on the same origin.
apps/sim/tools/crw/base-url.ts Clean utility for resolving the base URL, with trailing-slash stripping and a sensible default. Well-tested.
apps/sim/lib/api/contracts/byok-keys.ts Correctly adds 'crw' to the BYOK provider ID zod schema enum.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CrwBlock - crw.ts] -->|operation=scrape| B[crw_scrape tool]
    A -->|operation=search| C[crw_search tool]
    A -->|operation=crawl| D[crw_crawl tool]
    A -->|operation=map| E[crw_map tool]

    B --> F["POST /v1/scrape\n(fastcrw.com/api)"]
    C --> G["POST /v1/search\n(fastcrw.com/api)"]
    D --> H["POST /v1/crawl\n(fastcrw.com/api)"]
    E --> I["POST /v1/map\n(fastcrw.com/api)"]

    D -->|async job| J[postProcess polling loop]
    J --> K["GET /v1/crawl/{jobId}"]
    K -->|completed| L[Return pages + total]
    K -->|failed| M[Return error]
    K -->|timeout| N[Return timeout error]

    B --> O[transformResponse - always success:true]
    C --> P[transformResponse - always success:true]
    E --> Q[transformResponse - checks data.success]
Loading

Comments Outside Diff (1)

  1. apps/sim/tools/crw/crawl.ts, line 623-634 (link)

    P2 transformResponse ignores API-level job creation failure

    If the crawl POST returns HTTP 200 with { success: false, error: "…" }, transformResponse still returns success: true with jobId: undefined. postProcess then checks if (!result.success) (passes), and proceeds to poll ${baseUrl}/v1/crawl/undefined, which returns a 404 and surfaces a confusing "Failed to get crawl status: Not Found" error rather than the original creation error. Guard against this by checking data.success (or at least data.id) in transformResponse before the poll loop begins.

Reviews (1): Last reviewed commit: "feat: add fastCRW tool block" | Re-trigger Greptile