Implementing Search with Elasticsearch
When your application grows to millions of records, SQL LIKE queries just don't cut it. They're slow, they don't understand relevance, and they can't handle typos, synonyms, or complex filtering. Elasticsearch is the industry-standard solution: a distributed, RESTful search engine built on Apache Lucene that powers search at companies like GitHub, Wikipedia, Netflix, and Uber. In this guide you'll go from zero to a production-ready search implementation — covering installation, index design, query DSL, aggregations, and full PHP integration.
A SELECT * FROM products WHERE name LIKE '%laptop%' scans every row and can't use indexes. Elasticsearch uses an inverted index — the same structure as a book's index — so it finds documents in microseconds regardless of dataset size, and it scores results by relevance automatically.
How Elasticsearch Works
Elasticsearch stores data as JSON documents inside indices (analogous to database tables). Each index is split into shards — smaller Lucene indexes distributed across cluster nodes. This horizontal scaling is what allows Elasticsearch to handle billions of documents.
The key concepts you need to understand:
- Index: A collection of documents with a common schema (like a database table)
- Document: A single JSON record stored in an index
- Mapping: The schema that defines field types (text, keyword, date, integer, etc.)
- Shard: A single Lucene instance — indices are divided into shards for distribution
- Replica: A copy of a shard for fault tolerance and read performance
- Node: A single Elasticsearch server instance
- Cluster: One or more nodes working together
Elasticsearch cluster with 3 nodes, 3 primary shards (P0–P2), and 3 replicas
Installation and Setup
Elasticsearch requires Java (JVM) and is available for all major platforms. The easiest way to run it locally is via Docker, which avoids JDK configuration entirely.
Run Elasticsearch with Docker
The fastest way to get started — no JDK setup required, isolated from your system.
# Pull and run Elasticsearch 8.x (single-node dev mode)
docker run -d \
--name elasticsearch \
-p 9200:9200 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.12.0
# Verify it's running
curl http://localhost:9200
# Expected output:
# {
# "name" : "...",
# "cluster_name" : "docker-cluster",
# "version" : { "number" : "8.12.0", ... }
# }
Or Install Natively (Ubuntu/Debian)
For production servers, native installation integrates with systemd for automatic restarts.
# Import GPG key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
# Add repository
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Install
sudo apt update && sudo apt install elasticsearch
# Enable and start
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch
Elasticsearch 8.x ships with TLS and authentication enabled by default. The xpack.security.enabled=false flag used above is only for local development. In production, always enable security and place Elasticsearch behind a firewall — never expose port 9200 to the public internet.
Designing Your Index Mapping
Mapping is Elasticsearch's equivalent of a database schema. Getting it right upfront matters because most mapping changes require a full reindex — you can't change a field's type after documents are indexed. The two most important field types are:
text: Full-text search — analyzed, tokenized, stemmed. Use for blog content, product descriptions, anything you want to search "inside."keyword: Exact match — not analyzed. Use for filtering, sorting, aggregations (categories, status, IDs).
PUT /products
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"product_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding", "stop"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "product_analyzer",
"fields": {
"keyword": { "type": "keyword" }
}
},
"description": { "type": "text", "analyzer": "product_analyzer" },
"category": { "type": "keyword" },
"price": { "type": "float" },
"rating": { "type": "float" },
"in_stock": { "type": "boolean" },
"tags": { "type": "keyword" },
"created_at": { "type": "date" }
}
}
}
The name field uses a multi-field mapping — name for full-text search and name.keyword for exact match and sorting. This is a common pattern that avoids having to define two separate fields.
| Field Type | Use Case | Analyzable | Aggregatable |
|---|---|---|---|
text |
Full-text search (articles, descriptions) | Yes | No |
keyword |
Filtering, facets, exact match | No | Yes |
integer / float |
Numeric ranges, sorting, math | No | Yes |
date |
Date ranges, time-based sorting | No | Yes |
boolean |
Binary flags | No | Yes |
nested |
Arrays of objects with independent queries | Depends | Yes |
Indexing Documents
Adding documents to Elasticsearch is done via a simple HTTP PUT or POST request. Elasticsearch automatically creates the document ID or you can provide your own (typically the database primary key so you can sync updates).
# Index a document with explicit ID (recommended — use DB primary key)
PUT /products/_doc/1
{
"name": "MacBook Pro 16-inch M3",
"description": "Apple's most powerful laptop with M3 Pro chip, 18GB RAM, and 512GB SSD.",
"category": "Laptops",
"price": 2499.99,
"rating": 4.8,
"in_stock": true,
"tags": ["apple", "laptop", "m3", "professional"],
"created_at": "2026-01-15"
}
# Update a specific field (partial update)
POST /products/_update/1
{
"doc": {
"price": 2299.99,
"in_stock": false
}
}
# Delete a document
DELETE /products/_doc/1
# Bulk indexing (much more efficient for large datasets)
POST /_bulk
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "Dell XPS 15", "category": "Laptops", "price": 1799.99, "rating": 4.5, "in_stock": true, "tags": ["dell", "laptop"] }
{ "index": { "_index": "products", "_id": "3" } }
{ "name": "Sony WH-1000XM5", "category": "Headphones", "price": 349.99, "rating": 4.9, "in_stock": true, "tags": ["sony", "headphones", "noise-cancelling"] }
For large initial imports, always use the Bulk API. Sending documents one at a time creates significant HTTP overhead — bulk requests can be 10–100× faster.
Query DSL: Searching Your Data
Elasticsearch's Query DSL (Domain Specific Language) is a JSON-based query language. The two fundamental concepts are:
- Query context: "How well does this document match?" — scores documents by relevance
- Filter context: "Does this document match?" — yes/no, no scoring, results are cached
Match Query — Basic Full-Text Search
# Simple match — searches a single field
GET /products/_search
{
"query": {
"match": {
"name": "macbook pro"
}
}
}
# Multi-match — search across multiple fields with field boosting
GET /products/_search
{
"query": {
"multi_match": {
"query": "macbook pro",
"fields": ["name^3", "description", "tags^2"],
"type": "best_fields",
"fuzziness": "AUTO"
}
},
"_source": ["name", "price", "rating"],
"from": 0,
"size": 10
}
# name^3 means "name" is 3x more important than description
# fuzziness: AUTO handles typos (e.g. "macbok" still finds "MacBook")
Bool Query — Combining Conditions
The bool query is the workhorse of Elasticsearch. It combines other queries with four clauses:
must: Document must match — contributes to scoreshould: Document may match — boosts score if it doesmust_not: Document must NOT match — in filter context (no scoring)filter: Document must match — in filter context (cached, no scoring)
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "laptop",
"fields": ["name^3", "description"],
"fuzziness": "AUTO"
}
}
],
"filter": [
{ "term": { "in_stock": true } },
{ "term": { "category": "Laptops" } },
{
"range": {
"price": { "gte": 1000, "lte": 3000 }
}
},
{
"range": {
"rating": { "gte": 4.0 }
}
}
],
"must_not": [
{ "term": { "tags": "refurbished" } }
]
}
},
"sort": [
{ "_score": { "order": "desc" } },
{ "rating": { "order": "desc" } }
],
"from": 0,
"size": 20
}
Notice that the full-text search is in must (scores matter) while category, price, stock, and rating are in filter (binary match, cached). This is the correct pattern — filters are significantly faster because their results are cached by Elasticsearch.
Highlight and Autocomplete
# Highlight matching terms in results
GET /products/_search
{
"query": {
"match": { "description": "M3 chip performance" }
},
"highlight": {
"fields": {
"description": {
"pre_tags": ["<mark>"],
"post_tags": ["</mark>"],
"number_of_fragments": 3,
"fragment_size": 150
}
}
}
}
# Completion suggester (autocomplete as-you-type)
# First, add completion field to mapping:
# "name_suggest": { "type": "completion" }
POST /products/_search
{
"suggest": {
"product-suggest": {
"prefix": "mac",
"completion": {
"field": "name_suggest",
"size": 5,
"skip_duplicates": true
}
}
}
}
Aggregations: Analytics and Facets
Aggregations are Elasticsearch's analytics engine — they let you calculate statistics, build faceted navigation (like e-commerce filters), and generate histograms, all in a single query. Think of them as SQL's GROUP BY on steroids, running in parallel across all shards.
GET /products/_search
{
"query": {
"bool": {
"must": { "match": { "name": "laptop" } },
"filter": { "term": { "in_stock": true } }
}
},
"aggs": {
"by_category": {
"terms": {
"field": "category",
"size": 10
}
},
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 500, "key": "Under $500" },
{ "from": 500, "to": 1000, "key": "$500 - $1000" },
{ "from": 1000, "to": 2000, "key": "$1000 - $2000" },
{ "from": 2000, "key": "Over $2000" }
]
}
},
"avg_rating": {
"avg": { "field": "rating" }
},
"price_stats": {
"stats": { "field": "price" }
},
"top_tags": {
"terms": { "field": "tags", "size": 20 }
}
},
"size": 10
}
# Response includes:
# hits.hits[] — matched documents
# aggregations.by_category.buckets[] — facet counts per category
# aggregations.price_ranges.buckets[] — count per price range
# aggregations.avg_rating.value — 4.6
# aggregations.price_stats — { min, max, avg, sum, count }
PHP and Laravel Integration
The official Elasticsearch PHP client makes it easy to interact with Elasticsearch from your Laravel or Symfony application. Install it via Composer:
composer require elasticsearch/elasticsearch
Service Class Pattern
<?php
namespace App\Services;
use Elastic\Elasticsearch\ClientBuilder;
use Elastic\Elasticsearch\Client;
class SearchService
{
private Client $client;
private string $index = 'products';
public function __construct()
{
$this->client = ClientBuilder::create()
->setHosts([config('services.elasticsearch.host', 'localhost:9200')])
->build();
}
public function indexProduct(array $product): void
{
$this->client->index([
'index' => $this->index,
'id' => $product['id'],
'body' => [
'name' => $product['name'],
'description' => $product['description'],
'category' => $product['category'],
'price' => (float) $product['price'],
'rating' => (float) $product['rating'],
'in_stock' => (bool) $product['in_stock'],
'tags' => $product['tags'] ?? [],
'created_at' => $product['created_at'],
],
]);
}
public function deleteProduct(int $id): void
{
$this->client->delete([
'index' => $this->index,
'id' => $id,
]);
}
public function search(
string $query,
array $filters = [],
int $page = 1,
int $perPage = 20
): array {
$must = [];
$filterClauses = [];
if (!empty($query)) {
$must[] = [
'multi_match' => [
'query' => $query,
'fields' => ['name^3', 'description', 'tags^2'],
'fuzziness' => 'AUTO',
],
];
} else {
$must[] = ['match_all' => new \stdClass()];
}
if (!empty($filters['category'])) {
$filterClauses[] = ['term' => ['category' => $filters['category']]];
}
if (isset($filters['in_stock']) && $filters['in_stock']) {
$filterClauses[] = ['term' => ['in_stock' => true]];
}
if (!empty($filters['min_price']) || !empty($filters['max_price'])) {
$range = [];
if (!empty($filters['min_price'])) $range['gte'] = (float) $filters['min_price'];
if (!empty($filters['max_price'])) $range['lte'] = (float) $filters['max_price'];
$filterClauses[] = ['range' => ['price' => $range]];
}
$params = [
'index' => $this->index,
'body' => [
'query' => [
'bool' => [
'must' => $must,
'filter' => $filterClauses,
],
],
'aggs' => [
'categories' => ['terms' => ['field' => 'category', 'size' => 20]],
'price_ranges' => [
'range' => [
'field' => 'price',
'ranges' => [
['to' => 500],
['from' => 500, 'to' => 1000],
['from' => 1000, 'to' => 2000],
['from' => 2000],
],
],
],
'avg_rating' => ['avg' => ['field' => 'rating']],
],
'highlight' => [
'fields' => [
'name' => ['number_of_fragments' => 0],
'description' => ['number_of_fragments' => 2, 'fragment_size' => 150],
],
],
'from' => ($page - 1) * $perPage,
'size' => $perPage,
'sort' => [
['_score' => ['order' => 'desc']],
['rating' => ['order' => 'desc']],
],
],
];
$response = $this->client->search($params);
return [
'total' => $response['hits']['total']['value'],
'hits' => $response['hits']['hits'],
'aggregations' => $response['aggregations'] ?? [],
'took_ms' => $response['took'],
];
}
}
Keeping Elasticsearch in Sync with MySQL
Elasticsearch should be treated as a read-optimized replica of your primary database, not the source of truth. The most reliable sync pattern uses Laravel's Model Observers to automatically index changes:
<?php
namespace App\Observers;
use App\Models\Product;
use App\Services\SearchService;
class ProductObserver
{
public function __construct(private SearchService $search) {}
public function saved(Product $product): void
{
$this->search->indexProduct($product->toArray());
}
public function deleted(Product $product): void
{
$this->search->deleteProduct($product->id);
}
}
// Register in AppServiceProvider::boot():
// Product::observe(ProductObserver::class);
Initial Bulk Import
For existing data, create an Artisan command that streams records from MySQL and bulk-indexes them into Elasticsearch in batches of 500–1000.
Real-Time Sync via Observer
The Model Observer handles all future creates, updates, and deletes automatically — no extra code needed in your controllers.
Nightly Re-Index (Optional)
A scheduled Laravel command that re-indexes all records catches any edge cases where the observer sync was skipped (e.g., bulk DB updates that bypass Eloquent).
Artisan Bulk Reindex Command
<?php
namespace App\Console\Commands;
use App\Models\Product;
use App\Services\SearchService;
use Illuminate\Console\Command;
class ReindexProducts extends Command
{
protected $signature = 'search:reindex {--fresh : Delete index before reindexing}';
protected $description = 'Reindex all products in Elasticsearch';
public function handle(SearchService $search): int
{
$total = Product::count();
$bar = $this->output->createProgressBar($total);
Product::query()
->with('category')
->chunkById(500, function ($products) use ($search, $bar) {
foreach ($products as $product) {
$search->indexProduct($product->toArray());
$bar->advance();
}
});
$bar->finish();
$this->newLine();
$this->info("Indexed {$total} products successfully.");
return Command::SUCCESS;
}
}
Performance Best Practices
Elasticsearch is fast by default, but a few key decisions determine whether you'll get millisecond or second response times at scale:
- Use filter context for exact matches: Filters are cached; queries are not. Put category, price range, and boolean filters inside
filter, notmust. - Avoid deep pagination:
from: 10000is very expensive — Elasticsearch must retrieve and discard 10,000 documents on every shard. Use search_after cursor-based pagination for deep results. - Right-size your shards: Aim for 20–40 GB per shard. Too many small shards creates overhead; too few large shards limits parallelism.
- Disable
_sourcefor large fields: If you only need IDs from Elasticsearch (fetching full data from MySQL), disable_sourceor usestored_fieldsto avoid storing large text twice. - Set
refresh_intervalto 30s during bulk indexing: The default 1-second refresh is expensive during large imports. Set"refresh_interval": "30s"while loading and restore to"1s"after. - Use aliases for zero-downtime reindexing: Point your application at an alias (
products) rather than the index directly. When you reindex, swap the alias atomically to the new index.
Create products_v2, index all data into it, then atomically swap the products alias: POST /_aliases with {"actions": [{"remove": {"index": "products_v1", "alias": "products"}}, {"add": {"index": "products_v2", "alias": "products"}}]}. Your application never sees an interruption.
Conclusion
Elasticsearch transforms search from a slow SQL afterthought into a first-class feature. Here's what you've learned in this guide:
Key Takeaways
- Architecture: Elasticsearch uses an inverted index distributed across shards for sub-millisecond full-text search at any scale
- Mapping: Choose
textfor full-text search andkeywordfor filtering/aggregations — getting this right upfront avoids painful reindexes - Bool Query: Combine
must(scored) withfilter(cached) clauses for maximum performance - Aggregations: Power faceted navigation — categories, price ranges, ratings — all in a single query
- PHP Integration: Use Model Observers to keep Elasticsearch in sync with your primary database automatically
- Performance: Use filters over queries for exact matches, avoid deep pagination, and use aliases for zero-downtime reindexing
"Elasticsearch is not just a search engine — it's your analytics layer, your autocomplete engine, and your log aggregator all in one. Master the Query DSL and you've unlocked one of the most powerful tools in the backend developer's toolkit."
The patterns in this guide — the index mapping, the bool query structure, the PHP service class, and the Observer-based sync — are battle-tested in production at scale. Start with a single node locally, validate your mapping and queries, then scale horizontally as your data grows. Elasticsearch handles the distribution automatically.