If you've spent time in a research lab that collects sensor data, you've probably observed this scenario: a data file gets passed from one computer to the next, with someone modifying it slightly for analysis purposes to run a simulation. Then the person passes a modified copy to a colleague who does the same. A few rounds later, two researchers are working from different versions of the same dataset without realizing it. Their results don't match and neither knows why. Both spend an afternoon debugging instead of doing actual research.
This isn't an unusual failure but the default outcome when a team shares data informally, computer to computer, with no central coordination. Layer on top of that the reality of modern research data: it rarely comes from a single, clean source. You're pulling from open data portals like CKAN, live sensor APIs, internal databases, file servers, having diverse data format, access method, and quirks. Every researcher ends up writing their own integration glue, and the result is a lab full of scripts that all technically work but don't agree with each other.
CITYdata is our attempt to address this. It's an open-source middleware that provides a unified REST API for accessing and processing IoT data from heterogeneous sources embedding transparent caching built in so that heavy remote datasets aren't re-fetched every time someone new needs them.
I. The Core Idea of CITYdata
A core part of the vision, currently a work in progress, is transparent caching. Datasets from portals like CKAN can be large and slow to fetch over the network. If three researchers independently pull the same dataset in the same week, that's three redundant downloads, three local copies with three slightly different preprocessing steps. The idea is that CITYdata fetches it once, stores it centrally, and serves subsequent requests from cache, without researchers needing to think about it. The (already existing) /exists route already lays the groundwork for this: it lets you check whether data matching your query has already been pulled before committing to a full fetch.
Because CITYdata is a standard REST API, it integrates naturally with whatever stack a researcher is already using. You can call it with Python's requests library, JavaScript's fetch, Java's HttpClient, or Postman if you'd rather skip writing code.Because CITYdata is a standard REST API, it integrates naturally with whatever stack a researcher is already using. You can call it with Python's requests library, JavaScript's fetch, Java's HttpClient, or Postman if you'd rather skip writing code entirely.
II. CITYdata Architecture

CITYdata is a pattern-driven middleware built upon the Observer, Producer-Consumer, and Publisher-Subscriber design patterns. When a user sends a request to CITYdata, the middleware instantiates a Runner to orchestrate the entire workflow. The Runner invokes one or more Producers-components that connect to data sources (sensors, APIs, databases) and retrieve the relevant data. That data is then passed through a sequence of Operations, which apply the transformations the user requested. The final result is returned to the user, either as raw or processed data. In the scenario described, we have three main abstractions: Producers, Operations, and Runners.
II.1. Abstractions
Producers connect to data sources and fetch data. Each producer is designed in such a way that it knows how to talk to a specific source of data- a PostgreSQL database, a MongoDB collection, a CKAN open portal, a CSV file server, or a live sensor API. When writing a query, a user should specify which producer to use and pass the right parameters, what we show in the example Section.
Operations describe transformations to apply to Producer outputs, such as filtering rows by time window, or counting occurrences where a sensor value meets specific conditions. Operations can be chained to build a full processing pipeline within a single query.
Runners orchestrate everything by implementing the Observer and Publisher/Subscriber patterns under the hood. To process a query, a Runner invokes the relevant Producers, executes one or more Operations on their outputs, and returns the result to the consumer.
Tables 1 and 2 illustrate a subset of the Producers and Operations currently available in CITYdata.


II.2. The API Routes
Like any REST API, CITYdata exposes a set of endpoints to interact with its services. All endpoints share a common base URL - our server URL. To test these endpoints, you can use an API client tool like Postman by combining the base URL with any of the routes below:

III. CITYdata's Code Examples Usage
Beyond Postman, CITYdata can be integrated directly into your application. The example below shows the minimal setup needed to connect and query CITYdata from Python. Every interaction follows the same two steps:
1. Authenticate using your credentials to receive a Bearer token
2. Submit a query to /apply/sync (or /apply/async) with your token and a JSON body specifying the producer and parameters
import request, json
BASE_URL = "http://{baseURL}/citydata" # Replace with the server URL
USERNAME = "your_username"
PASSWORD = "your_password"
#### Step 1 — Authenticate
def authenticate(base_url, username, password):
response = requests.get(f"{base_url}/authenticate", auth=(username, password))
if response.status_code == 200:
return response.text # Returns Bearer token
raise Exception(f"Authentication failed: {response.status_code}")
#### Step 2 — Submit a query
def fetch_data(base_url, token, json_input):
response = requests.post(
f"{base_url}/apply/sync",
headers={"Content-Type": "application/json", "Authorization": f"Bearer {token}"},
data=json.dumps(json_input)
)
if response.status_code == 200:
return response.json()
raise Exception(f"Query failed: {response.status_code}")
IV. Conclusion
If the opening scenario resonated with you, CITYData is the fix you need. In this blog, we covered its architecture, core abstractions, available endpoints, and how to query it via Postman or integrated it in your Python application. CITYdata is open source and publicly available via https://github.com/ptidejteam/tools4cities-CITYdata.git. Try it out and send us your feedback!