Limit, Filter and Remove Duplicates in n8n
Learn how to use Limit, Filter, and Remove Duplicates nodes in n8n to clean, filter, and manage your workflow data efficiently.
Limit, Filter and Remove Duplicates in n8n Tutorial
In this tutorial, you'll learn how to use three essential data manipulation nodes in n8n: Limit, Filter, and Remove Duplicates. These nodes help you clean, organize, and control the data flowing through your workflows, ensuring you only process the data you need.
Overview
When working with large datasets or API responses, you often need to:
- Limit the number of items processed
- Filter items based on specific conditions
- Remove Duplicates to avoid processing the same data multiple times
These three nodes are fundamental building blocks for efficient workflow design and data management.
Limit Node
The Limit node restricts the number of items that pass through to the next node in your workflow.
When to Use Limit Node
- Process only the first N items from a large dataset
- Implement pagination manually
- Limit API calls to stay within rate limits
- Test workflows with a subset of data
- Control processing costs
Configuration Options
Max Items: The maximum number of items to pass through
Keep: Choose which items to keep
- First Items: Keep the first N items (default)
- Last Items: Keep the last N items
Example Use Cases
Use Case 1: Process First 10 Records
Input: 100 customer records
Limit Node Configuration:
- Max Items: 10
- Keep: First Items
Output: First 10 customer records
Use Case 2: Get Latest 5 Orders
Input: 50 orders (sorted by date descending)
Limit Node Configuration:
- Max Items: 5
- Keep: First Items
Output: 5 most recent orders
Use Case 3: Testing with Sample Data
Input: 1000 items from database
Limit Node Configuration:
- Max Items: 20
- Keep: First Items
Output: 20 items for testing
Best Practices
- Use before expensive operations (API calls, database writes)
- Combine with Sort node to get top/bottom N items
- Helpful for workflow development and testing
- Consider using with Loop nodes for batch processing
Filter Node
The Filter node keeps or removes items based on conditions you define. It's like a bouncer that only lets certain data through.
When to Use Filter Node
- Keep only items that meet specific criteria
- Remove items with missing or invalid data
- Filter by date ranges, status, or any field value
- Implement business logic rules
- Clean data before processing
Configuration Options
Conditions: Define rules to filter items
Combine Conditions:
- AND: All conditions must be true
- OR: At least one condition must be true
Keep Items Where: Conditions evaluate to true
Filter Operators
- Equals: Field equals value
- Not Equals: Field does not equal value
- Contains: Field contains text
- Does Not Contain: Field does not contain text
- Starts With: Field starts with text
- Ends With: Field ends with text
- Exists: Field exists
- Does Not Exist: Field does not exist
- Greater Than: Numeric comparison
- Less Than: Numeric comparison
- Is Empty: Field is empty or null
- Is Not Empty: Field has a value
- Regex: Match using regular expressions
Example Use Cases
Use Case 1: Filter Active Users
Condition:
Field: status
Operation: Equals
Value: active
Input:
[
{ "name": "John", "status": "active" },
{ "name": "Jane", "status": "inactive" },
{ "name": "Bob", "status": "active" }
]
Output:
[
{ "name": "John", "status": "active" },
{ "name": "Bob", "status": "active" }
]
Use Case 2: Filter High-Value Orders
Condition:
Field: order_total
Operation: Greater Than
Value: 500
Input:
[
{ "order_id": "001", "order_total": 750 },
{ "order_id": "002", "order_total": 200 },
{ "order_id": "003", "order_total": 1200 }
]
Output:
[
{ "order_id": "001", "order_total": 750 },
{ "order_id": "003", "order_total": 1200 }
]
Use Case 3: Multiple Conditions (AND)
Combine: AND
Condition 1:
Field: country
Operation: Equals
Value: USA
Condition 2:
Field: age
Operation: Greater Than
Value: 18
Result: Only users from USA who are over 18
Use Case 4: Multiple Conditions (OR)
Combine: OR
Condition 1:
Field: priority
Operation: Equals
Value: urgent
Condition 2:
Field: status
Operation: Equals
Value: critical
Result: Items that are either urgent OR critical
Use Case 5: Filter Valid Emails
Condition:
Field: email
Operation: Regex
Value: ^[^\s@]+@[^\s@]+\.[^\s@]+$
Result: Only items with valid email format
Filter Best Practices
- Use specific conditions to reduce false positives
- Test with sample data before production
- Consider using expressions for complex logic
- Chain multiple Filter nodes for readability
- Use "Is Not Empty" to ensure required fields exist
Remove Duplicates Node
The Remove Duplicates node eliminates duplicate items from your data based on specified fields.
When to Use Remove Duplicates Node
- Clean datasets with duplicate entries
- Prevent duplicate API calls or database writes
- Consolidate data from multiple sources
- Ensure unique records in reports
- Optimize workflow performance
Configuration Options
Compare: Choose which fields to compare for duplicates
- All Fields: Items must match on all fields
- Selected Fields: Specify which fields to compare
Fields to Compare: Select specific fields to determine duplicates
Example Use Cases
Use Case 1: Remove Duplicate Emails
Configuration:
Compare: Selected Fields
Fields: email
Input:
[
{ "name": "John", "email": "john@example.com" },
{ "name": "Jane", "email": "jane@example.com" },
{ "name": "John Doe", "email": "john@example.com" }
]
Output:
[
{ "name": "John", "email": "john@example.com" },
{ "name": "Jane", "email": "jane@example.com" }
]
Use Case 2: Remove Duplicate Orders by ID
Configuration:
Compare: Selected Fields
Fields: order_id
Input:
[
{ "order_id": "001", "total": 100 },
{ "order_id": "002", "total": 200 },
{ "order_id": "001", "total": 100 }
]
Output:
[
{ "order_id": "001", "total": 100 },
{ "order_id": "002", "total": 200 }
]
Use Case 3: Remove Exact Duplicates
Configuration:
Compare: All Fields
Input:
[
{ "name": "John", "age": 30 },
{ "name": "Jane", "age": 25 },
{ "name": "John", "age": 30 }
]
Output:
[
{ "name": "John", "age": 30 },
{ "name": "Jane", "age": 25 }
]
Use Case 4: Multiple Field Comparison
Configuration:
Compare: Selected Fields
Fields: first_name, last_name
Result: Removes duplicates where both first and last name match
Remove Duplicates Best Practices
- Place early in workflow to reduce processing
- Use specific fields for better control
- Consider case sensitivity in comparisons
- Test with your actual data structure
- Document which fields define uniqueness
Combining All Three Nodes
The real power comes from using these nodes together in your workflows.
Pattern 1: Filter → Remove Duplicates → Limit
1. Filter Node: Keep only active customers
2. Remove Duplicates: Remove duplicate emails
3. Limit Node: Take first 100 for processing
Result: 100 unique, active customers
Pattern 2: Remove Duplicates → Filter → Limit
1. Remove Duplicates: Remove duplicate orders
2. Filter Node: Keep orders > $500
3. Limit Node: Process top 50 orders
Result: Top 50 high-value unique orders
Pattern 3: Limit → Filter → Remove Duplicates
1. Limit Node: Take first 1000 items (performance)
2. Filter Node: Keep only valid records
3. Remove Duplicates: Remove any duplicates
Result: Unique valid records from first 1000 items
Practical Workflow Examples
Example 1: Clean Customer Contact List
Workflow:
1. Get contacts from database
2. Filter: Remove contacts without email
3. Filter: Keep only verified contacts
4. Remove Duplicates: By email field
5. Limit: Process 500 per batch
6. Send to email marketing platform
Example 2: Process Recent Orders
Workflow:
1. Fetch orders from API
2. Filter: Orders from last 30 days
3. Filter: Status = "completed"
4. Remove Duplicates: By order_id
5. Limit: Top 100 orders
6. Generate report
Example 3: Lead Qualification
Workflow:
1. Get leads from multiple sources
2. Remove Duplicates: By email and phone
3. Filter: Score > 70
4. Filter: Country in target list
5. Limit: 50 leads per day
6. Assign to sales team
Performance Tips
-
Order Matters: Place nodes strategically
- Remove duplicates early to reduce data volume
- Filter before expensive operations
- Limit early when testing
-
Use Expressions: For complex filtering logic
{{ $json.value > 100 && $json.status === "active" }} -
Batch Processing: Combine with Loop nodes
- Process data in chunks
- Prevent timeout errors
- Better resource management
-
Test Incrementally:
- Start with small Limit values
- Test filters with sample data
- Verify duplicate removal logic
Troubleshooting
Filter Not Working as Expected?
- Check field names match exactly (case-sensitive)
- Verify data types (string vs number)
- Test conditions with Execute Node
- Use expressions for complex logic
Remove Duplicates Not Removing Items?
- Verify field names are correct
- Check for trailing spaces or case differences
- Consider normalizing data before comparison
- Test with "All Fields" to see behavior
Limit Node Returning Wrong Items?
- Ensure data is sorted correctly
- Check if you need "First" or "Last" items
- Verify input data count
Key Benefits
- Data Quality: Clean and validated data flowing through workflows
- Performance: Process only necessary items, reducing costs and time
- Reliability: Prevent duplicate processing and errors
- Flexibility: Combine nodes for complex data manipulation
- Control: Precise control over data flow and processing
- Scalability: Handle large datasets efficiently
Common Use Cases Summary
| Node | Primary Use | When to Use | |------|-------------|-------------| | Limit | Control quantity | Testing, pagination, cost control | | Filter | Control quality | Data validation, business rules | | Remove Duplicates | Ensure uniqueness | Data consolidation, prevent duplicates |
Conclusion
The Limit, Filter, and Remove Duplicates nodes are essential tools for data management in n8n. By mastering these nodes, you can build more efficient, reliable, and cost-effective workflows that process exactly the data you need, exactly how you need it.
Start by using each node individually to understand their behavior, then combine them strategically to create powerful data processing pipelines. Remember: the order in which you use these nodes can significantly impact your workflow's performance and results.