Modern applications demand data models that can adapt to ever-changing requirements. E-commerce platforms, for instance, often feature product catalogs with a myriad of attributes that evolve over time. The challenge lies in designing a database that can accommodate these shifts without necessitating cumbersome migrations.
JSON (JavaScript Object Notation) offers a flexible solution by adapting to the shape of your data, thus eliminating the constraints of rigid schemas. As applications expand, the data model can evolve in tandem, providing a seamless experience.
PostgreSQL stands out by offering robust JSON capabilities while maintaining ACID compliance and advanced indexing. This makes it an attractive alternative to dedicated NoSQL solutions for handling JSON workloads. When utilized on AWS through fully managed services like Amazon Relational Database (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition, it delivers enterprise-grade features and integrates effortlessly with other AWS services, enhancing its overall functionality.
JSON capabilities of PostgreSQL
PostgreSQL supports JSON through two primary data types:
- JSON – This data type stores an exact copy of the input text, requiring parsing for each operation. It lacks indexing and necessitates reparsing every time it is accessed.
- JSONB – This type stores data in an optimized binary format, allowing for more efficient processing, albeit with slightly slower insertion speeds. JSONB offers extensive functionality.
The following table highlights the differences between these two data types:
| Feature | JSON | JSONB |
| Storage Format | Text-based, preserves whitespace and key order | Binary format, optimized for processing; larger on disk than JSON |
| Insertion Speed | Faster (no conversion overhead) | Slightly slower (requires binary conversion) |
| Query Performance | Slower (requires parsing each time) | Significantly faster for most operations |
| Indexing Support | Functional indexes can be added | Supports GIN indexing for efficient searches |
| Duplicate Keys | Preserves duplicate keys | Removes duplicate keys (keeps last value) |
| Memory Usage | Generally higher for complex operations | More efficient memory utilization |
| Nested Queries | Less efficient for deep nesting | Optimized for nested structure queries |
| Use Case | When input preservation is critical | Most production applications needing performance |
Real-world use cases
PostgreSQL is particularly valuable for systems that require a transactional, object-relational database, offering extended data types, a rich SQL interface, and advanced search capabilities. Here are some notable use cases:
- Web and mobile applications – PostgreSQL is frequently chosen as the backend for dynamic websites and mobile apps that manage high traffic and demand strong transactional integrity due to its scalability and reliability.
- Content management systems (CMS) – Many CMS platforms leverage PostgreSQL for its ability to handle complex content structures and large databases.
- Scientific research and data analysis – Its support for complex data types and advanced analytics makes PostgreSQL suitable for scientific research and data analysis.
- Geospatial applications (GIS) – With the PostGIS extension, PostgreSQL serves as a powerful geospatial database for mapping and location services.
- Applications requiring advanced search features – PostgreSQL excels in applications that demand sophisticated search capabilities, such as job search engines, content recommendation systems, and product discovery platforms.
To illustrate PostgreSQL’s JSON capabilities within a CMS application, we will focus on a digital news magazine. This scenario is ideal for PostgreSQL’s JSON features, as news articles comprise both structured components (like titles and publication dates) and unstructured, variable components (metadata that varies by article type).
Prerequisites
To follow along with our examples, you will need an AWS account, PostgreSQL version 17 or higher, and a basic understanding of SQL. For guidance on creating an Aurora database cluster, refer to this link.
Configure data
We begin by establishing our database structure. The following table definition illustrates how PostgreSQL can accommodate both structured and semi-structured data:
CREATE TABLE json_articles (
article_id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
edition SMALLINT NOT NULL,
content TEXT NOT NULL,
metadata JSONB
);
-- Create a GIN index for efficient search on JSON fields
CREATE INDEX idx_news_articles_metadata_fields ON json_articles USING GIN (
(metadata->'tags') jsonb_path_ops,
(metadata->'categories') jsonb_path_ops,
(metadata->'author') jsonb_path_ops
);
PostgreSQL can combine results from multiple independent index scans (including GIN indexes) by generating bitmaps of potential row locations. In this example, the metadata field is of type JSONB, which enhances performance in a CMS where editors frequently search and filter articles based on various metadata attributes. The GIN index we created accelerates queries that filter JSON data by tags, a common operation in news platforms.
Next, we will insert some sample articles that might feature in our digital magazine:
INSERT INTO json_articles VALUES
(1,
'Climate Change Summit Results',
1,
'World leaders agreed on new emissions targets to fight global warming...',
'{
"author": {"name": "Jane Smith", "email": "jane@example.com"},
"categories": ["environment", "politics", "global"],
"tags": ["climate change", "summit", "emissions", "policy"],
"metrics": {"read_time": 4.5, "comments_count": 28, "shares": 156}
}'),
(2,
'Global Warming Effects on Marine Life',
1,
'Rising ocean temperatures are threatening coral reefs and marine biodiversity...',
'{
"author": {"name": "Mark Johnson", "email": "mark@example.com"},
"categories": ["environment", "science", "marine"],
"tags": ["global warming", "oceans", "marine life", "climate change"],
"metrics": {"read_time": 5.2, "comments_count": 42, "shares": 205}
}')
If we were to adopt a purely relational approach, we would need separate tables for authors, categories, tags, and metrics, with foreign keys linking everything together. By utilizing JSON, we can keep related data together, simplifying the retrieval of complete article information in a single query. Moreover, if we later decide to add new metadata fields (such as video_duration for video articles or interview_subject for interviews), we can do so without modifying the database schema. This flexibility is invaluable for a CMS that must adapt to evolving content types.
Query patterns and optimization
This section delves into the search capabilities PostgreSQL offers for querying JSON data.
Query with JSON path operators
In our CMS, editors may need to locate all environment-related articles mentioning “climate change” to create a special feature. This query combines traditional SQL with JSON path operators:
-- Example of a basic query combining JSON and text search
-- This query finds articles with specific tags in the environment category
SELECT title, metadata->>'author' AS author_name
FROM json_articles
WHERE metadata->'categories' ? 'environment'
AND metadata->'tags' ? 'climate change';
Output:
Title Author_name
-------------------------------------------- -------------------------------------------------------
Climate Change Summit Results {"name": "Jane Smith", "email": "jane@example.com"}
Global Warming Effects on Marine Life {"name": "Mark Johnson", "email": "mark@example.com"}
In this query:
metadata->'categories' ? 'environment'uses the containment operator “?” to check if the categories array includes “environment“.metadata->'tags' ? 'climate'similarly checks if “climate change” is present in the tags array.metadata->>'author'extracts the author field as text (noting the use of ->>), sinceauthoris a JSON object, we receive the entire object as a string.
Transforming JSON into tabular data
CMS applications often require the extraction of structured data from JSON metadata for analytics. The JSON_TABLE function, compliant with ANSI specifications, allows for the presentation of JSON data in a tabular format:
-- Using JSON_TABLE to extract structured data from JSON (PostgreSQL 17+ feature)
-- This demonstrates how to convert nested JSON data into a relational format
SELECT
a.article_id,
a.title,
jt.author_name,
jt.author_email,
jt.category
FROM
json_articles a,
JSON_TABLE(a.metadata, '$' COLUMNS (
author_name VARCHAR(100) PATH '$.author.name',
author_email VARCHAR(100) PATH '$.author.email',
NESTED PATH '$.categories[*]' COLUMNS (
category VARCHAR(50) PATH '$'
)
)) AS jt
WHERE
(a.metadata -> 'metrics' ->> 'read_time')::float < 5;
Output:
article_id title author_name author_email category
---------- ----------------------------- -------------- ------------------------ ---------------
1 Climate Change Summit Results Jane Smith jane@example.com environment
1 Climate Change Summit Results Jane Smith jane@example.com politics
1 Climate Change Summit Results Jane Smith jane@example.com global
This query employs the JSON_TABLE function with PATH and NESTED PATH expressions to flatten JSON data, providing a relational view that can be utilized for reporting or data export. For older PostgreSQL versions, similar results can be achieved using lateral joins:
-- Alternative approach for PostgreSQL versions before 17
-- This achieves similar results using lateral joins and jsonb_array_elements
SELECT
a.article_id,
a.title,
a.metadata -> 'author' ->> 'name' AS author_name,
a.metadata -> 'author' ->> 'email' AS author_email,
c.category
FROM
json_articles a,
LATERAL jsonb_array_elements_text(a.metadata -> 'categories') AS c(category)
WHERE
(a.metadata -> 'metrics' ->> 'read_time')::float < 5;
Output:
article_id title author_name author_email category
---------- ----------------------------- --------------- -------------------------- ----------
1 Climate Change Summit Results Jane Smith jane@example.com environment
1 Climate Change Summit Results Jane Smith jane@example.com politics
1 Climate Change Summit Results Jane Smith jane@example.com global
Full-text search with JSON data
Advanced search techniques are essential for CMS applications to efficiently search their JSON data. In this section, we demonstrate how our CMS can leverage PostgreSQL’s full-text search capabilities.
Readers of our CMS need to search article content effectively. PostgreSQL’s full-text search works seamlessly with JSON content, as illustrated in the following query:
-- Create a text search index on content
CREATE INDEX json_articles_content_idx ON json_articles
USING GIN (to_tsvector('english', content));
-- Full-text search example with JSON filtering
-- This query finds articles about "global" AND "warming" in the environment category
-- and highlights the matching terms in the content
SELECT
a.article_id,
a.title,
ts_headline('english', a.content, q) AS highlighted_content
FROM
json_articles a,
to_tsquery('english', 'global & warming') AS q
WHERE
to_tsvector('english', a.content) @@ q
AND a.metadata @> '{"categories": ["environment"]}'
ORDER BY
ts_rank(to_tsvector('english', a.content), q) DESC;
Output:
article_id title highlighted_content
---------- ----------------------------- -------------------------------------------------
1 Climate Change Summit Results World leaders agreed on new emissions targets to fight global warming...
This query showcases several advanced search features:
- Highlighting search terms in results
- Ranking articles by relevance
- Filtering by metadata attributes
- Combining full-text search with structured data conditions
The query incorporates the following components:
- Full-text search operators – The
@@operator checks if atsvectormatches a tsquery. - Text search functions – It utilizes:
to_tsvector()to convert text into a searchable vector, managing stemming and stop words.to_tsquery()to parse a search query using Boolean operators.ts_headline()to generate highlighted excerpts with matching terms in bold.ts_rank()to rank results by relevance.
- JSON containment – The
@>operator checks if the left JSON contains the right JSON structure.
More GIN Indexing Options
Our GIN examples utilize jsonb_path_ops, which is suitable for presence checks like ?, but it is important to note that it does not support full-path queries or composite keys. For deeper JSON navigation, consider using jsonb_ops as an alternative.
-- Use jsonb_ops for full-path and composite key queries
CREATE INDEX idx_news_articles_metadata_ops ON json_articles
USING GIN (metadata jsonb_ops);
The technical distinction between jsonb_ops and jsonb_path_ops GIN indexes is that the former creates independent index items for each key and value in the data, while the latter generates index items only for each value. Generally, use each index type in the following scenarios:
- Use
jsonb_path_opsfor simple containment queries (smaller, faster).
CREATE INDEX idx_articles_tags ON json_articles USING GIN ((metadata->'tags') jsonb_path_ops);
jsonb_ops for complex path queries and composite operations.CREATE INDEX idx_articles_metadata ON json_articles USING GIN (metadata jsonb_ops);
For further details on jsonb_path_ops and jsonb_ops, refer to JSON Types.
Now, you can execute SQL statements like the following:
SELECT title, metadata->'author'->>'name' AS author_name
FROM json_articles
WHERE metadata @> '{"categories": ["environment"]}'
AND metadata @> '{"tags": ["climate change"]}';
Optimizing GIN indexes
In addition to selecting the appropriate GIN operator class, consider the following general strategies for optimizing GIN indexes:
- Use expression and partial indexes:
- If you frequently query a specific key, create an expression index on that key. This employs a standard B-tree index, which is significantly faster for simple equality checks.
- Combine an expression index with a
WHEREclause to index only a subset of your data. This is advantageous for large tables where only a fraction of rows needs indexing.
- Optimize bulk data loads – For large data loads, it is more efficient to load the data first and then create the index afterward. Creating the index in a single bulk operation is often faster than building it incrementally with each
INSERT. - Optimize
gin_pending_list_limit– This configuration parameter governs the size of a GIN index’s “pending list.” It is crucial for tables with high write traffic. Reducing the size of this parameter prioritizes fast, frequent writes. - Monitor and tune your indexes regularly.
Semantic and geospatial search capabilities
When your application requires semantic and geospatial search techniques alongside PostgreSQL’s strengths, AWS recommends complementary architectural patterns. Common approaches include integrating PostgreSQL with Amazon OpenSearch Service, creating purpose-built search solutions that leverage the strengths of each service. The integration of PostgreSQL with OpenSearch, implemented through a pull or push approach and an OpenSearch Integration Pipeline, provides a fully managed, no-code experience for ingesting data into OpenSearch.
Utilizing separate, optimized services like Amazon OpenSearch Service for vector and geospatial search allows PostgreSQL to focus on tasks where it excels, such as complex queries and maintaining transactional integrity. Amazon OpenSearch Service is designed to handle high-dimensional vector embeddings and intricate geospatial queries, making it a more efficient choice for those specific workloads.
Clean up
Once you have finished experimenting with this solution, be sure to clean up the resources you created to avoid incurring additional charges.
AWS-specific benefits
While PostgreSQL has built-in JSON capabilities, Aurora PostgreSQL-Compatible and Amazon RDS for PostgreSQL provide infrastructure and management features that significantly enhance how effectively you can work with JSON data:
- Distributed storage architecture – Aurora’s storage layer mitigates I/O bottlenecks when processing large JSON documents, allowing for more efficient read/write operations. The storage automatically expands without manual intervention, which is ideal for applications with unpredictable JSON data growth.
- Aurora cluster cache management – The shared buffer cache across instances enhances performance for frequently accessed JSON documents. Buffer cache persistence across instance restarts prevents performance degradation after maintenance events.
- Parallel query processing – PostgreSQL’s parallel query processing can enhance performance for certain JSON operations by distributing work across multiple CPU cores. This feature is particularly effective for analytical queries on large datasets, although complex JSONB path operations and GIN index lookups may not always benefit from parallelization.
- Read replicas with minimal lag – Aurora replication enables up to 15 read replicas with minimal lag, facilitating effective scaling of JSON read operations. Amazon Aurora Serverless v2 dynamically scales to accommodate varying JSON read operations, automatically adjusting compute capacity as your workload fluctuates.
- Instance type flexibility – You can choose from various instance types:
- Memory-optimized instances for applications with large JSON documents that benefit from in-memory processing.
- Burstable-performance instances that are cost-effective for workloads with moderate CPU usage and occasional spikes.
- Read-optimized instances for high-throughput JSON ingestion workloads.
- CloudWatch database insights – Aurora PostgreSQL-Compatible and Amazon RDS for PostgreSQL offer detailed monitoring that is particularly useful for identifying bottlenecks in complex JSON operations.
Best practices
By adhering to best practices, you can effectively utilize PostgreSQL’s JSON capabilities while maintaining performance and data integrity in your AWS-hosted databases.
- Strategic data modeling:
- Use JSONB for most use cases unless exact text preservation is necessary.
- Normalize frequently queried fields into regular columns for improved performance.
- Consider partial normalization where some structured data resides in columns while variable data remains in JSON.
- Indexing strategies:
- Create GIN indexes for frequently queried JSON paths.
- Utilize expression indexes for specific JSON properties.
- Monitor index usage and remove any unused indexes that may slow down writes.
- Query optimization:
- Employ containment operators
@>and<@with GIN indexes for optimal performance. - Leverage path operators
->and->>to avoid extracting entire documents. - Extract values to appropriate types when performing comparisons.
- Employ containment operators
- Performance considerations:
- Avoid overusing JSON for data that would be better suited to relational columns.
- Use
JSONB_PATH_OPSfor GIN indexes when only the containment operator is utilized. - Batch update JSON documents instead of frequently updating individual fields.
- Perform targeted path updates rather than updating the full document.
- Avoid query anti-patterns:
- Avoid scanning all documents when more selective filters could be applied.
- Do not extract entire JSON documents when only specific fields are required.
- Avoid joining on JSON fields without proper indexing or type casting.
- Schema evolution:
- Validate your JSON schema at the database level using check constraints to ensure JSON schema validity.
- Recent versions of PostgreSQL offer several functions, such as
IS JSON,IS JSON ARRAY, orIS JSON OBJECTfor simple validity checks. - Application-level validation can also be performed using JSON Schema or similar tools.
- Storage considerations:
- Avoid document bloat by removing obsolete fields.
- Do not store data in both JSON and regular columns.
- AWS-specific considerations:
- Utilize CloudWatch Database Insights to identify bottlenecks in JSON operations.
- Account for JSON storage size, as it can be less compact than normalized tables.