Using Amazon OpenSearch Serverless in AWS IoT Core

Using Amazon OpenSearch Serverless in AWS IoT Core


On Jan 25, 2023, Amazon OpenSearch Serverless was launched for general availability. So, I summarize which is better for Managed (EC2) or Serverless in IoT solutions. And summarize what you can do with OpenSearch Serverless with AWS IoT Core.

Amazon OpenSearch Serverless is now generally available!

What is OpenSearch?

OpenSearch is an open-source search and analysis suite that can be used for a wide range of use cases including real-time application monitoring, log analysis, and website search.
OpenSearch provides a highly scalable system for providing fast access and response to large amounts of data using the OpenSearch dashboard, an integrated visualization tool.
Amazon OpenSearch is an AWS Managed Service.

What is Amazon OpenSearch Service?

Purpose

To flow telemetry to OpenSearch for searching and analyzing device telemetry, enabling near real-time querying.
Summarize whether Amazon OpenSearch Managed or OpenSearch Serverless is better for MQTT telemetry management using AWS IoT Core.

OpenSearch EC2 VS OpenSearch Serverless

Summarize what each can and cannot do

FeatureOpenSearch EC2OpenSearch Serverless
Supported OpenSearch OperationsSupports a subset of all OpenSearch API operationsOpenSearch Serverless supports a different subset of OpenSearch API operations. Advanced OpenSearch features such as alerts, anomaly detection, k-NN, etc. are not supported.
Sign in to the DashboardSign in with username and password. Cognito authentication is also available.Sign in with IAM access key and secret key. Cognito authentication is not supported.
Upgrade OpenSearchUpgrade domain manually. You are responsible for ensuring that the domain meets upgrade requirements and addresses significant changes.Automatically upgrade your collection to the new OpenSearch version. Upgrades are not always performed as soon as a new version becomes available.
Cluster ManagementMust manage master nodes and data nodes. (EC2,EBS)No need to manage collections because they are managed clusters.
Suitable Use CasesIn scenarios that require strict control over cluster configuration or specific customizations, it is better to use a managed cluster. You can choose your preferred instance and version, and have fine control over settings such as shorter refresh intervals and data-sharding strategies.Easy to implement. No need to think about sizing, monitoring, and tuning OpenSearch clusters for periodic, intermittent, or unpredictable workloads.

Pricing

Amazon OpenSearch Service Pricing

OpenSearch EC2 (monthly)

The hourly rate varies depending on the instance type. In addition, redundant configurations increase the number of instances, which increases the fee.

  • Basic billing scope

  • Data node instance fee

  • Master node instance fee

  • Storage (EBS volume) fees

  • Standard AWS data transfer fee

  • There is also a charge for reserved instances

Example of charges

Assume you use an instance type that runs in production and is activated in 3AZ (Availability Zone) 24 hours a day, 31 days a week. You have to over-provision for peak loads in the production environment. It would be cheaper if we also use reserved instances, but the device availability is not predictable, so it is not included in the calculation. The instance type can be changed.

Once we try to calculate with on-demand type with m6 instance type in Computing Optimization, but This is not the optimal configuration value.

m6g.large.search (3AZ configuration) 959.29 USD
# Data node instance fee *621.96 USD*
- Tokyo Region m6g.large.search = 0.074 USD per hour
- Hours of use 730 hours per month
- Number of nodes: 6 (3AZ configuration)
- 100% utilization/month

# Master node instance fee *310.98 USD*
- Tokyo Region m6g.large.search = 0.074 USD per hour
- Hours of use 730 hours per month
- Number of nodes: 3 (3AZ configuration)

# Storage cost *26.35 USD*
- Tokyo Region EBS general-purpose SSD (gp3) storage 0.1464 USD/month/GB
- Number of instances: 6
- IOPS per volume provisioned: 3000 IOPS minimum
Provisioning throughput (MB/s)/volume min 125 MBps
- Utilized Capacity 30GB/month
m6g.large.search (2AZ configuration) 743.19 USD
# Instance fee for data node *414.64 USD*
- Tokyo Region m6g.large.search = 0.074 USD per hour
- Hours of use 730 hours per month
- Number of nodes: 4 (2AZ configuration)

# Master node instance fee *162.06 USD*
- Tokyo Region m6g.large.search = 0.074 USD per hour
- Hours of use 730 hours per month
- Number of nodes: 3 (2AZ configuration)

# Storage cost *17.57 USD*
- Tokyo Region EBS general-purpose SSD (gp3) storage 0.162 USD/month/GB
- Number of instances: 4
- IOPS per volume provisioning: 3000 IOPS minimum Provisioning throughput (MB/s)/volume min 125 MBps
- Capacity used: 30GB/month

Can be cheaper depending on the instance type, but after going live, an unanticipated number of uses will require resource sizing and resource scaling adjustments. Fees may increase.

OpenSearch Serverless (monthly)

You only pay for the resources consumed by the workload; OpenSearch serverless is billed separately for computing and storage.

Resource TypeFees(Tokyo Region)
OpenSearch Compute Unit (OCU) - Indexing0.334USD 1 OCU per hour
OpenSearch Compute Unit (OCU) - Search and Query0.334USD 1 OCU per hour
Managed Storage0.026USD 1 GB per month

Example of charges

Assume 24 hours 31 days activated.

Minimum of 4 OCUs (2x index includes primary and standby, 2x search includes 1 replica for HA) charged for the first collection in the account.
Can be changed to a minimum of 2 OCUs since it is the first collection.

Search and query charges
  • 0.3334*2(minimum 2 OCUs 12GB RAM)*24(hours)*31(days)=496.0992USD/month

  • 0.3334*4(4 OCUs 24GB RAM)*24(hours)*31(days)=992.1984USD/month

  • 0.3334*6(6 OCU 36GB RAM)*24(hours)*31(days)=1488.2976USD/month

Indexing Fees
  • 0.3334*2(minimum 2 OCUs 12GB RAM)*24(hours)*31(days)=496.0992USD/month

  • 0.3334*4(4 OCUs 24GB RAM)*24(hours)*31(days)=992.1984USD/month

  • 0.3334*6(6 OCU 36GB RAM)*24(hours)*31(days)=1488.2976USD/month

Both "Search and Query" and "Indexing" fees are charged, so the actual
  • Minimum 2 OCUs 12GB of RAM: 496.0992USD *2=992.1984USD/month

  • 4 OCU 24GB RAM: 992.1984USD*2=1984.3968USD/month

  • 6 OCU 36GB RAM: 1488.2976USD*2=2976.5952USD/month

The fee is high, but Serverless is easy to manage because you don't have to worry about resource sizing and resource scaling.

Architecture using AWS IoT Core

I summarized what configuration is best.

OpenSearch EC2

When using Managed running EC2, data is stored by taking the following configuration. It is also possible to go through Kinesis Data Firehose.

OpenSearch Serverless

Data can be stored by configuring the following telemetry of the device and command operations from the application. It is also possible to flow only necessary items to OpenSearch using Lambda.

About OpenSearch Serverless

I will summarize Serverless from here.

Use Cases

Time series – The log analytics segment that focuses on analyzing large volumes of semi-structured, machine-generated data in real-time for operational, security, user behavior, and business insights.

Search – Full-text search that powers applications in your internal networks (content management systems, legal documents) and internet-facing applications, such as e-commerce website search and content search.

Choosing a collection type

You can choose the collection type "time series" or "search," but you cannot change it once it is created, so we will summarize the use cases.

  • For search, all search data is stored in hot storage for faster query response time.

  • For time series, 12 hours of data is cached in hot storage and 7 days in warm storage.

  • The most important difference between search and time series collections is that in the case of time series, updates by document ID (POST index/_update/) are not possible. This operation is reserved only for the search use case.

  • Time-series workloads are write-intensive, whereas retrieval workloads are read-intensive.

  • Search workloads have a smaller data corpus than time series workloads.

  • Search workloads are more sensitive to latency and require faster response than time-series workloads.

  • Queries on time-series data are executed against recent data, whereas search queries are executed against the entire corpus.

Security

  • You can choose whether network access is public or VPC. You can also specify encryption keys.

  • All OpenSearch Serverless data is encrypted in transit and storage by default.

  • Granular collection-level and account-level security policies can be created for all collections and indexes.

  • Encryption policies can specify AWS Key Management Service (AWS KMS) keys for a single collection, all collections, or a subset of collections using wildcard matching patterns.

  • OpenSearch Dashboards can be accessed using SAML and AWS Identity and Access Management (IAM) credentials.

Fault tolerance

  • Replicate the index across availability zones by default.

  • Compute nodes for the index run in an active-standby configuration.

  • No need to worry about resource scaling.

  • Scale out compute nodes when sudden workloads occur, and scale in nodes when they decrease.

  • All indexed data is stored in S3.

Cost and Cost-effective

  • No need to size and provision resources upfront.

  • No need to over-provision for peak workloads in production environments.

  • Pay only for the computing and storage resources consumed by your workloads.

Serverless Notes

  • There is no way to automatically migrate data from a Managed OpenSearch Service domain to an OpenSearch Serverless collection.

  • Data must be re-indexed from the domain to the collection.

  • It is not possible to take or restore a snapshot of an OpenSearch Serverless collection.

  • Index update intervals may range from 10 to 30 seconds, depending on the size of the request.

  • As for upgrading OpenSearch Serverless, it is done automatically.

  • For example, destructive changes applied since OpenSearch 3.0 include changes in terminology.

  • For example: GET cat/master -> GET cat/cluster_manager

  • It is not possible to change the usage type of a collection after it has been created.

  • Specify either "time series" or "search".

OpenSearch serverless API operation

API operations are referenced here.

I actually Publish the sample data in AWS IoT Core and tried to see what kind of search I can do with OpenSearch Serverless with Kinesis Data Firehose to flow the data.

Summarize the actual verified search process.

# Check index mapping definitions
GET sample-logs/_mappings

# All Search
GET sample-logs/_search
{
    "query": {
        "match_all": {
        }
    },
    "track_total_hits": "true"
}

# Term query(※no hits)
GET sample-logs/_search
{
  "query": {
    "term": {
      "client_id": {
        "value": "SampleDevice001"
      }
    }
  },
  "size": 10
}
# Term query(hits)
GET sample-logs/_search
{
  "query": {
    "term": {
      "client_id.keyword": {
        "value": "SampleDevice001"
      }
    }
  },
  "size": 10
}


# Match query(full-text query)
GET sample-logs/_search
{
  "size": 10, 
  "query": 
  {
    "match": {
      "client_id": "SampleDevice001"
    }
  }
}

# Range query
GET sample-logs/_search
{
  "size": 20, 
  "query": {
    "range": {
      "timestamp": {
        "gte": 20230209160000,
        "lte": 20230209162355
      }
    }
  }
}

# Prefix(※no hits)
GET sample-logs/_search
{
  "query": {
    "prefix": {
      "client_id": "Sample"
    }
  }
}
# Prefix(hits)
GET sample-logs/_search
{
  "query": {
    "prefix": {
      "client_id.keyword": {
        "value": "Sample"
      }
    }
  }
}

# Wildcard
GET sample-logs/_search
{
  "query": {
    "wildcard": {
      "client_id": "Sample*"
    }
  }
}

# Bool query(Search by Combined Criteria)
GET sample-logs/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "client_id": "SampleDevice001"
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": 20230209162300,
              "lte": 20230209162330
            }
          }
        }
      ]
    }
  },
  "size": 10
}

# Aggregations(Aggregation by ClientID specification)
GET sample-logs/_search
{
  "size":0,
  "aggs": {
    "client_id": {
      "terms":{"field": "client_id.keyword"}
    }
  }
}
# Aggregations(Aggregation by nested structure)
GET sample-logs/_search
{
  "size":0,
  "aggs": {
    "uav_status.flight_st": {
      "terms":{"field": "status.power.keyword"}
    }
  }
}


# Serverless does not support SQL
POST xxxxxxxxxxxxxxxxxx.ap-northeast-1.aoss.amazonaws.com/_plugins/_sql
{
  "query": "SELECT * FROM sample-logs LIMIT 50"
}


#===========================================
# Check if you can do the relation

# Parent-child relationship mapping definition
PUT testindex1
{
  "mappings": {
    "properties": {
      "product_to_brand": {
        "type": "join",
        "relations": {
          "brand": "product" 
        }
      }
    }
  }
}
# Confirmation of parent-child relationship mapping
GET testindex1/_mappings

# index to parent document
POST testindex1/_doc
{
  "name": "Brand 1",
  "product_to_brand": {
    "name": "brand" 
  }
}

# index to child documents
POST testindex1/_doc?routing=1%3A0%3A0uSyOoYB4ZeEJo4ZDIbe
{
  "name": "Product 1",
  "product_to_brand": {
    "name": "product", 
    "parent": "1" 
  }
}

# Search Parent-Child Relationship
GET testindex1/_search
{
  "query": {
    "match_all": {}
  }
}

Conclusion

OpenSearch Serverless has only recently been GA, so we look forward to feature upgrades.

Reference