Turning HSDA Service At Location into a Search Optimized Format

Connect 211 has offered the same Elasticsearch endpoint that we consume internally to external, enterprise users as a service. All of our work follows the standards set forth by Open Referral in HSDS (Human Service Data Specification) and HSDA (Human Services Data APIs) wherever appropriate.

When we built our first Elasticsearch endpoint, HSDA was not being emphasized, so we pretty much designed it ad-hoc. Since then, HSDA has come into it’s own as a full blown API specification. We took a recent opportunity while updating our Elasticsearch endpoint to also bring it into closer alignment with HSDA. In particular, the service_at_location endpoint is very close to what we need, and we decided to follow that as closely as possible. However, there were a few adjustments required to optimize performance.

These items specifically apply to Elasticsearch, and are only more broadly applicable insofar as other search technologies use similar paradigms (which many do).

Only keep what we need

The first and most obvious modification for performant search is to filter out anything that doesn’t:

Have keyword value, like names and descriptions
Provide direct, quick access to the resource, like a phone number
Augment navigability with categories (taxonomies), facets (for filtering), or geospatial data (also for filtering)

Our search records are a stripped down stripped down considerably, only keeping what users search and filter on to find resources, plus a few “quick contact” elements.

Disambiguate data

There is something that we add to the data: clarity. Something we learned and implemented with the first endpoint was that we need to know which phone, of many, should be displayed to users first. Same for schedule info, addresses, URLs, etc. We add fields such as schedule, phone (and add a priority field the phones array!), etc to create clarity on what should be displayed to users, and in what order.

In addition, standardizing names and descriptions between different data sources may require a concatenation of service at location or service by organization, depending on localized style guides. These get created and normalized in the service_at_location.name|description fields.

Nesting should situationally be avoided

A first basic principle is that is nesting objects is acceptable for performance, and good for organization. Yay! We decided to nest location and service just like the HSDA example shows: https://github.com/openreferral/specification/blob/3.0/examples/service_at_location_full.json

However, using arrays is pretty bad for performance and should be avoided as much as possible. This is because next items in an array are essentially treated as separate, searchable documents, and the search engine has to keep track of those relationships and add joins to searches now.

The HSDS `address` object

Last last point directly impacted the specification, because addresses are an essential component to include, but are nested in an array in HSDS, with a field of type differentiating between physical and postal addresses. That’s great for database normalization, but not for performant search documents, so we added a physical_address key to the location object. We aren’t concerned with postal addresses for this particular endpoint.

Summary

In the end, we did create a search object that is as close to HSDA as we could get while still following best practices and common sense for optimized searching. Significantly, the only change was how we structure address data. The rest was merely additive or subtractive