Index Geo Flight Data Using Azure Search Push API (Python version)

Think Gradient
thinkgradient
Published in
4 min readJan 9, 2022

--

Author: Fatos Ismali

In this tutorial, you will learn how to index data using the Azure Search Push API instead of the default indexer from the Azure Search Portal. You will learn the following key elements:

  • How to index JSON documents in batch
  • How to iterate through JSON blobs in Azure Storage using the Python SDK and index each file in Azure Search
  • How to index to GeoPoint type in Azure Search using latitude and longitude as inputs

We’re going to use the flight traffic data publicly available at https://www.adsbexchange.com/data/. More specifically, the sample data from 1st of August 2021 available here https://samples.adsbexchange.com/readsb-hist/2021/08/01.

Pre-requesites:

Azure Cognitive Search supports two main mechanisms for indexing data:

  • Push mechanism allows you to index any data type composed in JSON documents to an Azure Search index. It provides great flexibility to index data from any data source and at any frequency. You can send data to the index in real-time or in batches. Batch method support up to 1000 documents per batch or 16 MB whichever limit is reached first.
  • Pull mechanism leverages the in-built indexer functionality of Azure Search to pull data from various supported sources. An indexer can be scheduled to run periodicially. You can also run multiple indexers in parallel as long as you have enough partitions in your Azure Search service. An Azure Search partition is mapped to a single indexer.

We will be using the Push mechanism in this tutorial to index flight data from an Azure Storage account container into an Azure Search index. As you can see from the Push diagram above, the data could reside in an AWS S3 bucket or GCP cloud storage account or any other data service across Azure, AWS, or GCP as long as these services support a REST API or SDK.

The code with instructions supporting this tutorial can be found at the following link: https://github.com/thinkgradient/azure-search-push-python

Once you’ve gone through the steps above and the instruction from the Github README.md your Azure Search index will be populated with Flight Data as follows:

Notice, how the id of the each document has been generated using a md5 hashing function on the json document (line) itself and used as the document key in the index.

You can see from the screenshot above that each field has a specific type. In addition to the type Azure Search allows you to specify whether each field is searchable, retrievable, facetable, sortable and so forth. All the index field are defined in the geodata-schema.json file (in the github link). For example the “flight” field has the following attributes:

For a more in depth explanation for each of these attributes refer to the following: https://docs.microsoft.com/en-us/azure/search/search-what-is-an-index#field-attributes.

Also note, how we’ve brought the latitude and longitude values into a GeoPoint type that Azure search supports out of the box. Once you have a GeoPoint availabe in your indexed data you can run all sort of Geo-Spatial functions such geo.distance to calculate the distance between two Geo points in kilometers or geo.intersects to determine if a given Geo point is within a given polygon. For more example refer to the following link: https://docs.microsoft.com/en-us/azure/search/search-query-odata-geo-spatial-functions

Note: You may need to delete the index and re-create it if you require to change an attribute in any of the fields from false to true or vice versa.

To interact with the indexed data you can either use the Azure Search Portal UI or issue REST API request against the service (either through Postman or Python or your favorite language). In the github code you will find a sample Jupyter Notebook which shows you how to interact with the Search service index through its REST API. For example to retrieve all flights that are within a 10 km range from the following Geo point POINT(-71.060867 42.854651) we issue the following REST request:

The request above would result in:

{'@odata.context': "https://searchservicename.search.windows.net/indexes('flightindex')/$metadata#docs(*)",
'@odata.count': 4,
'value': [{'@search.score': 1.0, 'flight': 'N7931K '},
{'@search.score': 1.0, 'flight': 'N7931K '},
{'@search.score': 1.0, 'flight': 'N442MG '},
{'@search.score': 1.0, 'flight': 'N442MG '}]}

--

--