درب ذكاءك الاصطناعي على Web Crawler

المصادقة: Public crawling (no auth)مزامنة تزايديةتحديث كامل

Open's web crawler indexes any public website you point it at — your docs site, help center, blog, knowledge base, marketing pages — and feeds the content into your AI agent's knowledge. Recrawls on a schedule so the agent stays current.

اتصل مرة واحدة، وأوبن يحدث معرفة النظام الذكي عندك تلقائياً. لما تحدث المحتوى في Web Crawler، التغييرات تتزامن تلقائياً - ما تحتاج إعادة تدريب يدوية.

→يتزامن مع→الوكيل الذكي

وش يمكن مزامنته

Pages — All discoverable pages with full HTML extraction.

Sitemaps — sitemap.xml-based discovery.

Selectors — CSS-selector-based content extraction.

المميزات

•Sitemap-aware — Reads sitemap.xml when present and follows discovered links recursively.
•Smart re-crawl — Recrawls on a schedule (daily/weekly/monthly) and only updates pages that changed.
•Selector-based extraction — Configure CSS selectors to skip nav, footer, and ads — only the content the agent needs.
•JS rendering — Renders JavaScript-heavy SPAs so client-rendered content gets indexed too.

المتطلبات

•Public or basic-auth-protected website

كيف تتصل

1.In Open, go to AI Training → Sources → Add Web Crawler
2.Enter the seed URL and (optionally) a sitemap URL
3.Configure selectors to include / exclude (nav, footer, ads)
4.Set the recrawl schedule
5.Run the first crawl and review the indexed content

معلومات مهمة

Respects robots.txt by default; can be overridden for sites you own
JS-heavy SPAs supported via headless rendering
Per-domain rate limits keep crawls polite

الأمان: أوبن يحتاج فقط صلاحية القراءة لـ Web Crawler عندك. أبداً ما نكتب، نعدل، أو نحذف محتواك. كل البيانات مشفرة أثناء النقل وفي التخزين. متوافق مع GDPR، ونعمل للحصول على SOC 2 Type II.

جاهز تربط Web Crawler؟

تدريب الذكاء الاصطناعي ← المصادر ← Web Crawler

تسجيل الدخول ←

مصادر تدريب أخرى

Notion

Confluence

Google Docs عرض الكل ←