[n8n] Web Scraping with FireCrawl & Google Sheets (Basic)

$0

This n8n workflow automates the process of scraping website content using FireCrawl and saving the extracted data into a Google Sheet in a structured way.

Automate web scraping with FireCrawl and save content to Google Docs

This n8n workflow automates the process of scraping website content using FireCrawl and saving the extracted data into a Google Sheet in a structured way.

Who is this for?

This template is ideal for marketers, content creators, researchers, and developers who need to extract information from websites quickly and easily without writing code. It’s perfect for tasks like content aggregation, competitor analysis, data collection for AI model training, and market research.

Key Features

  • One-click Scraping: Scrape data from any URL with a single execution.
  • AI-Powered Extraction: Leverages FireCrawl to intelligently crawl and extract clean markdown and metadata from websites, ready for large language models (LLMs).
  • Automated Data Organization: Automatically formats and saves the scraped content into a new Google Sheet.
  • Google Drive Integration: Keeps your data organized by storing the generated Google Sheets files in a specific Google Drive folder.

How it works

  1. Manual Trigger: The workflow is started manually when you click ‘Execute workflow’.
  2. Specify URL: A “Set” node holds the target URL you want to scrape.
  3. Scrape with FireCrawl: An HTTP Request node sends the URL to FireCrawl’s API, which scrapes the website’s content.
  4. Create Spreadsheet: The scraped data, formatted in markdown, is used to create a binary spreadsheet file.
  5. Upload to Google Drive: The newly created spreadsheet is uploaded to your Google Drive.
  6. Update Google Sheet: The content is written to a specific Google Sheet using the file ID from the previous step, making the data accessible and easy to use.

Requirements

  • n8n instance: You need a working n8n instance.
  • FireCrawl account: A FireCrawl account and API key are required. You can sign up at firecrawl.dev.
  • Google credentials: You need to have n8n credentials set up for both Google Drive and Google Sheets.

Step-by-step Setup

  1. Import template: Import the template into your n8n canvas.
  2. Configure FireCrawl:
    • Select the Scrape an URL node.
    • Under ‘Authentication’, select ‘Header Auth’.
    • Click ‘Create New Credential’ for ‘Generic Credential’.
    • Enter a ‘Credential Name’ (e.g., “FireCrawl API Key”).
    • Set the ‘Header Auth Name’ to Authorization.
    • In the ‘Header Auth Value’ field, enter Bearer YOUR_FIRECRAWL_API_KEY, replacing YOUR_FIRECRAWL_API_KEY with your actual API key.
    • Save the credential.
  3. Configure Google Drive:
    • Select the Upload spreadsheet to Google Drive node.
    • Choose your Google account from the ‘Credential’ dropdown menu or create a new one.
    • Specify the ‘Parent Folder ID’ in your Google Drive where you want to save the spreadsheets.
  4. Configure Google Sheets:
    • Select the Update Google Sheets node.
    • Select the same Google account credential.
    • Enter the ‘Sheet ID’ of the Google Sheet you want to populate with the scraped data.
  5. Set your target URL:
    • Select the Input an URL node.
    • In the ‘Value’ field for the URL variable, replace the default URL with the website you want to scrape.
  6. Activate and execute:
    • Activate the workflow using the toggle switch in the top-right corner.
    • Click ‘Execute workflow’ to run it. Your scraped data will appear in the specified Google Sheet.

How to customize the workflow

  • Batch Scraping: To scrape multiple URLs at once, replace the Input an URL node with a Code node or a Google Sheets node that outputs a list of URLs. Then, connect it to the Scrape an URL node to process them sequentially.
  • Different Data Points: FireCrawl can extract more than just markdown content (e.g., metadata, HTML). Modify the Create spreadsheet node to include other data points from the FireCrawl output as needed. You can inspect the output of the Scrape an URL node to see all available data.
  • Error Handling: Add an ‘Error Trigger’ to the workflow to catch any potential issues during the scraping process (e.g., a website blocking the scrape) and send a notification via Slack or email.

FAQ – Frequently Asked Questions

1. Who is this workflow intended for?
This workflow is designed for users who have a basic understanding of n8n and are capable of troubleshooting issues on their own. If you’re familiar with optimizing prompts and handling minor issues, this product is a great fit for you.


2. How is the workflow installed and used?
The workflow comes pre-configured by default, which means you can import and run it immediately. However, to achieve optimal performance for your specific use case or business needs, you may need to customize and optimize the prompts.


3. What should I keep in mind during testing?
During testing, we recommend using low-cost models (such as mini or flash) and generating low-resolution images to save on costs. The primary goal is to ensure the workflow operates reliably before making any further optimizations. Note that the low-cost models may cause error to the workflow.


4. What are the default and alternative AI models?
By default, the workflow uses the GPT-4o model due to its stability and excellent ability to return data in the required JSON format. If you encounter any issues, you can try switching to ChatGPT-4o. Note that some other models (like Gemini Flash) may not return results in JSON format or support tool calls, which could cause the workflow to malfunction.


5. How do I troubleshoot if the workflow fails to run?
Please try the following steps:

  • Run the workflow in an incognito window with all plugins disabled.
  • Try using a different browser (for example, switch from Chrome to Safari).
  • Test on another computer or in a different network environment/ server.
    Keep in mind that issues can stem from various sources, including limitations of the AI model, your self-hosted n8n server, the n8n platform itself, or even your local device/ network/ server settings.

6. How can I submit feedback or report a bug?
You can contact us to submit your suggestions, comments, or bug reports related to the workflow and documentation. Every piece of feedback is carefully reviewed to address bugs or incorporate quality improvements in future versions.


7. Is technical support included after purchase?
At present, purchasing the workflow provides you with the file only, without any technical support. In the future, we plan to offer additional support packages, including tutorial videos, technical consulting, and customization services based on customer needs.


8. Can I share or resell the workflow?
Please do not share or resell the workflow without obtaining prior permission from us. The product is protected by copyright, and unauthorized sharing or resale is strictly prohibited.


9. How do I submit feedback on my purchasing experience?
If you have any comments or suggestions regarding your purchasing experience, please send us a message. Your input is valuable to us and will help improve our services and product quality.


10. What is the refund policy?
Due to the nature of the workflow product, our shop does not currently offer refunds for purchases. In the future, we plan to sell our products on platforms that support refund policies. However, please note that the prices on those platforms will be significantly higher compared to purchasing directly from our shop.


If you have any further questions or need additional information, please feel free to contact us through our contact form.

Truly,
AI Automation Pro

Review Your Cart
0
Add Coupon Code
Subtotal