Limit, Filter and Remove Duplicates in n8n Tutorial

In this tutorial, you'll learn how to use three essential data manipulation nodes in n8n: Limit, Filter, and Remove Duplicates. These nodes help you clean, organize, and control the data flowing through your workflows, ensuring you only process the data you need.

Overview

When working with large datasets or API responses, you often need to:

Limit the number of items processed
Filter items based on specific conditions
Remove Duplicates to avoid processing the same data multiple times

These three nodes are fundamental building blocks for efficient workflow design and data management.

Limit Node

The Limit node restricts the number of items that pass through to the next node in your workflow.

When to Use Limit Node

Process only the first N items from a large dataset
Implement pagination manually
Limit API calls to stay within rate limits
Test workflows with a subset of data
Control processing costs

Configuration Options

Max Items: The maximum number of items to pass through

Keep: Choose which items to keep

First Items: Keep the first N items (default)
Last Items: Keep the last N items

Example Use Cases

Use Case 1: Process First 10 Records

Input: 100 customer records
Limit Node Configuration:
  - Max Items: 10
  - Keep: First Items
Output: First 10 customer records

Use Case 2: Get Latest 5 Orders

Input: 50 orders (sorted by date descending)
Limit Node Configuration:
  - Max Items: 5
  - Keep: First Items
Output: 5 most recent orders

Use Case 3: Testing with Sample Data

Input: 1000 items from database
Limit Node Configuration:
  - Max Items: 20
  - Keep: First Items
Output: 20 items for testing

Best Practices

Use before expensive operations (API calls, database writes)
Combine with Sort node to get top/bottom N items
Helpful for workflow development and testing
Consider using with Loop nodes for batch processing

Filter Node

The Filter node keeps or removes items based on conditions you define. It's like a bouncer that only lets certain data through.

When to Use Filter Node

Keep only items that meet specific criteria
Remove items with missing or invalid data
Filter by date ranges, status, or any field value
Implement business logic rules
Clean data before processing

Configuration Options

Conditions: Define rules to filter items

Combine Conditions:

AND: All conditions must be true
OR: At least one condition must be true

Keep Items Where: Conditions evaluate to true

Filter Operators

Equals: Field equals value
Not Equals: Field does not equal value
Contains: Field contains text
Does Not Contain: Field does not contain text
Starts With: Field starts with text
Ends With: Field ends with text
Exists: Field exists
Does Not Exist: Field does not exist
Greater Than: Numeric comparison
Less Than: Numeric comparison
Is Empty: Field is empty or null
Is Not Empty: Field has a value
Regex: Match using regular expressions

Example Use Cases

Use Case 1: Filter Active Users

Condition:
  Field: status
  Operation: Equals
  Value: active

Input: 
  [
    { "name": "John", "status": "active" },
    { "name": "Jane", "status": "inactive" },
    { "name": "Bob", "status": "active" }
  ]

Output:
  [
    { "name": "John", "status": "active" },
    { "name": "Bob", "status": "active" }
  ]

Use Case 2: Filter High-Value Orders

Condition:
  Field: order_total
  Operation: Greater Than
  Value: 500

Input: 
  [
    { "order_id": "001", "order_total": 750 },
    { "order_id": "002", "order_total": 200 },
    { "order_id": "003", "order_total": 1200 }
  ]

Output:
  [
    { "order_id": "001", "order_total": 750 },
    { "order_id": "003", "order_total": 1200 }
  ]

Use Case 3: Multiple Conditions (AND)

Combine: AND
Condition 1:
  Field: country
  Operation: Equals
  Value: USA

Condition 2:
  Field: age
  Operation: Greater Than
  Value: 18

Result: Only users from USA who are over 18

Use Case 4: Multiple Conditions (OR)

Combine: OR
Condition 1:
  Field: priority
  Operation: Equals
  Value: urgent

Condition 2:
  Field: status
  Operation: Equals
  Value: critical

Result: Items that are either urgent OR critical

Use Case 5: Filter Valid Emails

Condition:
  Field: email
  Operation: Regex
  Value: ^[^\s@]+@[^\s@]+\.[^\s@]+$

Result: Only items with valid email format

Filter Best Practices

Use specific conditions to reduce false positives
Test with sample data before production
Consider using expressions for complex logic
Chain multiple Filter nodes for readability
Use "Is Not Empty" to ensure required fields exist

Remove Duplicates Node

The Remove Duplicates node eliminates duplicate items from your data based on specified fields.

When to Use Remove Duplicates Node

Clean datasets with duplicate entries
Prevent duplicate API calls or database writes
Consolidate data from multiple sources
Ensure unique records in reports
Optimize workflow performance

Configuration Options

Compare: Choose which fields to compare for duplicates

All Fields: Items must match on all fields
Selected Fields: Specify which fields to compare

Fields to Compare: Select specific fields to determine duplicates

Example Use Cases

Use Case 1: Remove Duplicate Emails

Configuration:
  Compare: Selected Fields
  Fields: email

Input:
  [
    { "name": "John", "email": "john@example.com" },
    { "name": "Jane", "email": "jane@example.com" },
    { "name": "John Doe", "email": "john@example.com" }
  ]

Output:
  [
    { "name": "John", "email": "john@example.com" },
    { "name": "Jane", "email": "jane@example.com" }
  ]

Use Case 2: Remove Duplicate Orders by ID

Configuration:
  Compare: Selected Fields
  Fields: order_id

Input:
  [
    { "order_id": "001", "total": 100 },
    { "order_id": "002", "total": 200 },
    { "order_id": "001", "total": 100 }
  ]

Output:
  [
    { "order_id": "001", "total": 100 },
    { "order_id": "002", "total": 200 }
  ]

Use Case 3: Remove Exact Duplicates

Configuration:
  Compare: All Fields

Input:
  [
    { "name": "John", "age": 30 },
    { "name": "Jane", "age": 25 },
    { "name": "John", "age": 30 }
  ]

Output:
  [
    { "name": "John", "age": 30 },
    { "name": "Jane", "age": 25 }
  ]

Use Case 4: Multiple Field Comparison

Configuration:
  Compare: Selected Fields
  Fields: first_name, last_name

Result: Removes duplicates where both first and last name match

Remove Duplicates Best Practices

Place early in workflow to reduce processing
Use specific fields for better control
Consider case sensitivity in comparisons
Test with your actual data structure
Document which fields define uniqueness

Combining All Three Nodes

The real power comes from using these nodes together in your workflows.

Pattern 1: Filter → Remove Duplicates → Limit

1. Filter Node: Keep only active customers
2. Remove Duplicates: Remove duplicate emails
3. Limit Node: Take first 100 for processing

Result: 100 unique, active customers

Pattern 2: Remove Duplicates → Filter → Limit

1. Remove Duplicates: Remove duplicate orders
2. Filter Node: Keep orders > $500
3. Limit Node: Process top 50 orders

Result: Top 50 high-value unique orders

Pattern 3: Limit → Filter → Remove Duplicates

1. Limit Node: Take first 1000 items (performance)
2. Filter Node: Keep only valid records
3. Remove Duplicates: Remove any duplicates

Result: Unique valid records from first 1000 items

Practical Workflow Examples

Example 1: Clean Customer Contact List

Workflow:
1. Get contacts from database
2. Filter: Remove contacts without email
3. Filter: Keep only verified contacts
4. Remove Duplicates: By email field
5. Limit: Process 500 per batch
6. Send to email marketing platform

Example 2: Process Recent Orders

Workflow:
1. Fetch orders from API
2. Filter: Orders from last 30 days
3. Filter: Status = "completed"
4. Remove Duplicates: By order_id
5. Limit: Top 100 orders
6. Generate report

Example 3: Lead Qualification

Workflow:
1. Get leads from multiple sources
2. Remove Duplicates: By email and phone
3. Filter: Score > 70
4. Filter: Country in target list
5. Limit: 50 leads per day
6. Assign to sales team

Performance Tips

Order Matters: Place nodes strategically
- Remove duplicates early to reduce data volume
- Filter before expensive operations
- Limit early when testing

Use Expressions: For complex filtering logic

{{ $json.value > 100 && $json.status === "active" }}

Batch Processing: Combine with Loop nodes
- Process data in chunks
- Prevent timeout errors
- Better resource management
Test Incrementally:
- Start with small Limit values
- Test filters with sample data
- Verify duplicate removal logic

Troubleshooting

Filter Not Working as Expected?

Check field names match exactly (case-sensitive)
Verify data types (string vs number)
Test conditions with Execute Node
Use expressions for complex logic

Remove Duplicates Not Removing Items?

Verify field names are correct
Check for trailing spaces or case differences
Consider normalizing data before comparison
Test with "All Fields" to see behavior

Limit Node Returning Wrong Items?

Ensure data is sorted correctly
Check if you need "First" or "Last" items
Verify input data count

Key Benefits

Data Quality: Clean and validated data flowing through workflows
Performance: Process only necessary items, reducing costs and time
Reliability: Prevent duplicate processing and errors
Flexibility: Combine nodes for complex data manipulation
Control: Precise control over data flow and processing
Scalability: Handle large datasets efficiently

Common Use Cases Summary

| Node | Primary Use | When to Use | |------|-------------|-------------| | Limit | Control quantity | Testing, pagination, cost control | | Filter | Control quality | Data validation, business rules | | Remove Duplicates | Ensure uniqueness | Data consolidation, prevent duplicates |

Conclusion

The Limit, Filter, and Remove Duplicates nodes are essential tools for data management in n8n. By mastering these nodes, you can build more efficient, reliable, and cost-effective workflows that process exactly the data you need, exactly how you need it.

Start by using each node individually to understand their behavior, then combine them strategically to create powerful data processing pipelines. Remember: the order in which you use these nodes can significantly impact your workflow's performance and results.