feat: add fastCRW tool block by us · Pull Request #5025 · simstudioai/sim
Greptile Summary
This PR adds fastCRW as a new tool block (scrape / crawl / map / search), mirroring the existing Firecrawl block. The integration is additive-only: new files under tools/crw/ and blocks/blocks/crw.ts, plus registration in the block/tool registries, BYOK keys, CSP allowlist, icon, and integrations.json.
- Four tool configs (
crw_scrape,crw_search,crw_crawl,crw_map) mirror Firecrawl's structure with fastCRW-specific differences:maxPagesinstead oflimitfor crawl, a dynamicbaseUrlparam for self-hosting, and aresolveCrwBaseUrlhelper. - Registration is complete across all required locations (BYOK schema, type union, CSP, icon mapping, integrations JSON), and a test file covers URL construction, body building, and response transformation for all four operations.
Confidence Score: 4/5
The change is purely additive and isolated to new files; no existing functionality is modified. The three tools with hardcoded success responses will silently swallow API-level errors, but they won't cause data corruption or affect other blocks.
Three of the four new tools (scrape, search, crawl) always return success: true from transformResponse even when the API body indicates failure — the crawl case is the worst because an undefined jobId leads the poll loop to request /v1/crawl/undefined, masking the real error. The fourth tool (map) handles this correctly, making the inconsistency self-contained within this PR. No other part of the codebase is touched.
apps/sim/tools/crw/scrape.ts, apps/sim/tools/crw/search.ts, apps/sim/tools/crw/crawl.ts — the transformResponse functions in all three need to check data.success before reporting a successful result.
Important Files Changed
| Filename | Overview |
|---|---|
| apps/sim/blocks/blocks/crw.ts | New block config mirroring Firecrawl; routes scrape/search/crawl/map to the correct crw_* tools, formats params, and exposes baseUrl for self-hosting. Clean and consistent with existing block patterns. |
| apps/sim/tools/crw/scrape.ts | Scrape tool is structurally correct but hardcodes success: true in transformResponse regardless of API-level errors, unlike map.ts which properly checks data.success. |
| apps/sim/tools/crw/search.ts | Search tool also hardcodes success: true in transformResponse; same inconsistency with map.ts. Additionally, limit and sources params are used in the body builder but not declared in the tool's params definition (though this mirrors the Firecrawl search pattern). |
| apps/sim/tools/crw/crawl.ts | Crawl tool implements async polling correctly, but transformResponse ignores data.success — if job creation returns HTTP 200 with success:false, postProcess will poll /v1/crawl/undefined leading to a confusing 404 error instead of the real failure. |
| apps/sim/tools/crw/map.ts | Map tool correctly checks data.success in transformResponse and handles missing links with a fallback array. Well-structured and complete. |
| apps/sim/tools/crw/types.ts | Comprehensive type definitions and output property constants. Clean mirror of the Firecrawl types, with appropriate additions for fastCRW-specific fields. |
| apps/sim/tools/crw/crw.test.ts | Good coverage of URL construction, body building, and response transformation for all four operations. Tests document the expected API response shapes clearly. |
| apps/sim/lib/core/security/csp.ts | Adds https://fastcrw.com to connect-src allowlist. Covers the full domain/origin, which is sufficient since the API lives at /api/v1/* on the same origin. |
| apps/sim/tools/crw/base-url.ts | Clean utility for resolving the base URL, with trailing-slash stripping and a sensible default. Well-tested. |
| apps/sim/lib/api/contracts/byok-keys.ts | Correctly adds 'crw' to the BYOK provider ID zod schema enum. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[CrwBlock - crw.ts] -->|operation=scrape| B[crw_scrape tool]
A -->|operation=search| C[crw_search tool]
A -->|operation=crawl| D[crw_crawl tool]
A -->|operation=map| E[crw_map tool]
B --> F["POST /v1/scrape\n(fastcrw.com/api)"]
C --> G["POST /v1/search\n(fastcrw.com/api)"]
D --> H["POST /v1/crawl\n(fastcrw.com/api)"]
E --> I["POST /v1/map\n(fastcrw.com/api)"]
D -->|async job| J[postProcess polling loop]
J --> K["GET /v1/crawl/{jobId}"]
K -->|completed| L[Return pages + total]
K -->|failed| M[Return error]
K -->|timeout| N[Return timeout error]
B --> O[transformResponse - always success:true]
C --> P[transformResponse - always success:true]
E --> Q[transformResponse - checks data.success]
Comments Outside Diff (1)
-
apps/sim/tools/crw/crawl.ts, line 623-634 (link)transformResponseignores API-level job creation failureIf the crawl POST returns HTTP 200 with
{ success: false, error: "…" },transformResponsestill returnssuccess: truewithjobId: undefined.postProcessthen checksif (!result.success)(passes), and proceeds to poll${baseUrl}/v1/crawl/undefined, which returns a 404 and surfaces a confusing "Failed to get crawl status: Not Found" error rather than the original creation error. Guard against this by checkingdata.success(or at leastdata.id) intransformResponsebefore the poll loop begins.
Reviews (1): Last reviewed commit: "feat: add fastCRW tool block" | Re-trigger Greptile