avwx.service

Report Source Services

AVWX fetches the raw weather reports from third-party services via REST API calls or file downloads. We use Service objects to handle the request and extraction for us.

Basic Module Use

METARs and TAFs are the most widely-supported report types, so an effort has been made to localize some of these services to a regional source. The get_service function was introduced to determine the best service for a given station.

# Fetch Australian reports
station = 'YWOL'
country = 'AU' # can source from avwx.Station.country
# Get the station's preferred service and initialize to fetch METARs
service = avwx.service.get_service(station, country)('metar')
# service is now avwx.service.AUBOM init'd to fetch METARs
# Fetch the current METAR
report = service.fetch(station)

Other report types require specific service classes which are found in their respective submodules. However, you can normally let the report type classes handle these services for you.

Adding a New Service

If the existing services are not supplying the report(s) you need, adding a new service is easy. First, you'll need to determine if your source can be scraped or you need to download a file.

ScrapeService

For web scraping sources, you'll need to do the following things:

  • Add the base URL and method (if not "GET")
  • Implement the ScrapeService._make_url to return the source URL and query parameters
  • Implement the ScrapeService._extract function to return just the report string (starting at the station ID) from the response

Let's look at the MAC service as an example:

class MAC(StationScrape):
    """Requests data from Meteorologia Aeronautica Civil for Columbian stations"""

    _url = "http://meteorologia.aerocivil.gov.co/expert_text_query/parse"
    method = "POST"

    def _make_url(self, station: str) -> tuple[str, dict]:
        """Returns a formatted URL and parameters"""
        return self._url, {"query": f"{self.report_type} {station}"}

    def _extract(self, raw: str, station: str) -> str:
        """Extracts the report message using string finding"""
        return self._simple_extract(raw, f"{station.upper()} ", "=")

Our URL and query parameters are returned using _make_url so fetch knows how to request the report. The result of this query is given to _extract which returns the report or list of reports.

Once your service is created, it can optionally be added to avwx.service.scrape.PREFERRED if the service covers all stations with a known ICAO prefix or avwx.service.scrape.BY_COUNTRY if the service covers all stations in a single country. This is how avwx.service.get_service determines the preferred service. For example, the MAC service is preferred over NOAA for all ICAOs starting with "SK" while AUBOM is better for all Australian stations.

FileService

For file-based sources, you'll need to do the following things:

  • Add the base URL and valid report types
  • Implement the FileService._urls to iterate through source URLs
  • Implement the FileService._extract function to return just the report string (starting at the station ID) from the response

Let's look at the NOAA_NBM service as an example:

class NOAA_NBM(FileService):
    """Requests forecast data from NOAA NBM FTP servers"""

    _url = "https://nomads.ncep.noaa.gov/pub/data/nccf/com/blend/prod/blend.{}/{}/text/blend_{}tx.t{}z"
    _valid_types = ("nbh", "nbs", "nbe")

    @property
    def _urls(self) -> Iterator[str]:
        """Iterates through hourly updates no older than two days"""
        date = dt.datetime.now(tz=dt.timezone.utc)
        cutoff = date - dt.timedelta(days=1)
        while date > cutoff:
            timestamp = date.strftime(r"%Y%m%d")
            hour = str(date.hour).zfill(2)
            yield self.url.format(timestamp, hour, self.report_type, hour)
            date -= dt.timedelta(hours=1)

    def _extract(self, station: str, source: TextIO) -> Optional[str]:
        """Returns report pulled from the saved file"""
        start = station + "   "
        end = self.report_type.upper() + " GUIDANCE"
        txt = source.read()
        txt = txt[txt.find(start) :]
        txt = txt[: txt.find(end, 30)]
        lines = []
        for line in txt.split("\n"):
            if "CLIMO" not in line:
                line = line.strip()
            if not line:
                break
            lines.append(line)
        return "\n".join(lines) or None

In this example, we iterate through _urls looking for the most recent published file. URL iterators should always have a lower bound to stop iteration so the service can return a null response.

Once the file is downloaded, the requested station and file-like object are passed to the _extract method to find and return the report from the file. This method will not be called if the file doesn't exist.

 1""".. include:: ../../docs/service.md"""
 2
 3from avwx.service.base import Service
 4from avwx.service.files import NoaaGfs, NoaaNbm
 5from avwx.service.scrape import (
 6    Amo,
 7    Aubom,
 8    Avt,
 9    # FaaNotam,
10    Mac,
11    Nam,
12    Noaa,
13    Olbs,
14    get_service,
15)
16
17__all__ = (
18    "get_service",
19    "Noaa",
20    "Amo",
21    "Aubom",
22    "Avt",
23    "Mac",
24    "Nam",
25    "Olbs",
26    # "FaaNotam",
27    "NoaaGfs",
28    "NoaaNbm",
29    "Service",
30)
def get_service(station: str, country_code: str) -> avwx.service.scrape.ScrapeService:
625def get_service(station: str, country_code: str) -> ScrapeService:
626    """Return the preferred scrape service for a given station.
627
628    ```python
629    # Fetch Australian reports
630    station = "YWOL"
631    country = "AU"  # can source from avwx.Station.country
632    # Get the station's preferred service and initialize to fetch METARs
633    service = avwx.service.get_service(station, country)("metar")
634    # service is now avwx.service.Aubom init'd to fetch METARs
635    # Fetch the current METAR
636    report = service.fetch(station)
637    ```
638    """
639    with suppress(KeyError):
640        return PREFERRED[station[:2]]  # type: ignore
641    return BY_COUNTRY.get(country_code, Noaa)  # type: ignore

Return the preferred scrape service for a given station.

# Fetch Australian reports
station = "YWOL"
country = "AU"  # can source from avwx.Station.country
# Get the station's preferred service and initialize to fetch METARs
service = avwx.service.get_service(station, country)("metar")
# service is now avwx.service.Aubom init'd to fetch METARs
# Fetch the current METAR
report = service.fetch(station)
Noaa = <class 'avwx.service.scrape.NoaaApi'>
class Amo(avwx.service.scrape.StationScrape):
223class Amo(StationScrape):
224    """Request data from AMO KMA for Korean stations."""
225
226    _url = "http://amoapi.kma.go.kr/amoApi/{}"
227    default_timeout = 60
228
229    def _make_url(self, station: str) -> tuple[str, dict]:
230        """Return a formatted URL and parameters."""
231        return self._url.format(self.report_type), {"icao": station}
232
233    def _extract(self, raw: str, station: str) -> str:  # noqa: ARG002
234        """Extract the report message from XML response."""
235        resp = parsexml(raw)
236        try:
237            report = resp["response"]["body"]["items"]["item"][f"{self.report_type.lower()}Msg"]
238        except KeyError as key_error:
239            raise self._make_err(raw) from key_error
240        if not report:
241            msg = "The station might not exist"
242            raise self._make_err(msg)
243        # Replace line breaks
244        report = report.replace("\n", "")
245        # Remove excess leading and trailing data
246        for item in (self.report_type.upper(), "SPECI"):
247            if report.startswith(f"{item} "):
248                report = report[len(item) + 1 :]
249        report = report.rstrip("=")
250        # Make every element single-spaced and stripped
251        return " ".join(report.split())

Request data from AMO KMA for Korean stations.

default_timeout = 60
class Aubom(avwx.service.scrape.StationScrape):
274class Aubom(StationScrape):
275    """Request data from the Australian Bureau of Meteorology."""
276
277    _url = "http://www.bom.gov.au/aviation/php/process.php"
278    method = "POST"
279
280    @staticmethod
281    def _make_headers() -> dict:
282        """Return request headers."""
283        return {
284            "Content-Type": "application/x-www-form-urlencoded",
285            "Accept": "*/*",
286            "Accept-Language": "en-us",
287            "Accept-Encoding": "gzip, deflate",
288            "Host": "www.bom.gov.au",
289            "Origin": "http://www.bom.gov.au",
290            "User-Agent": secrets.choice(_USER_AGENTS),
291            "Connection": "keep-alive",
292        }
293
294    def _post_data(self, station: str) -> dict:
295        """Return the POST form."""
296        return {"keyword": station, "type": "search", "page": "TAF"}
297
298    def _extract(self, raw: str, station: str) -> str:  # noqa: ARG002
299        """Extract the reports from HTML response."""
300        index = 1 if self.report_type == "taf" else 2
301        try:
302            report = raw.split("<p")[index]
303            report = report[report.find(">") + 1 :]
304        except IndexError as index_error:
305            msg = "The station might not exist"
306            raise self._make_err(msg) from index_error
307        if report.startswith("<"):
308            return ""
309        report = report[: report.find("</p>")]
310        return report.replace("<br />", " ")

Request data from the Australian Bureau of Meteorology.

method = 'POST'
class Avt(avwx.service.scrape.StationScrape):
375class Avt(StationScrape):
376    """Request data from AVT/XiamenAir for China.
377    NOTE: This should be replaced later with a gov+https source.
378    """
379
380    _url = "http://www.avt7.com/Home/AirportMetarInfo?airport4Code="
381
382    def _make_url(self, station: str) -> tuple[str, dict]:
383        """Return a formatted URL and empty parameters."""
384        return self._url + station, {}
385
386    def _extract(self, raw: str, station: str) -> str:  # noqa: ARG002
387        """Extract the reports from HTML response."""
388        try:
389            data = json.loads(raw)
390            key = f"{self.report_type.lower()}ContentList"
391            text: str = data[key]["rows"][0]["content"]
392        except (TypeError, json.decoder.JSONDecodeError, KeyError, IndexError):
393            return ""
394        else:
395            return text

Request data from AVT/XiamenAir for China. NOTE: This should be replaced later with a gov+https source.

class Mac(avwx.service.scrape.StationScrape):
254class Mac(StationScrape):
255    """Request data from Meteorologia Aeronautica Civil for Columbian stations."""
256
257    _url = "https://meteorologia.aerocivil.gov.co/expert_text_query/parse"
258    method = "POST"
259
260    @staticmethod
261    def _make_headers() -> dict:
262        """Return request headers."""
263        return {"X-Requested-With": "XMLHttpRequest"}
264
265    def _post_data(self, station: str) -> dict:
266        """Return the POST form/data payload."""
267        return {"query": f"{self.report_type} {station}"}
268
269    def _extract(self, raw: str, station: str) -> str:
270        """Extract the report message using string finding."""
271        return self._simple_extract(raw, f"{station.upper()} ", "=")

Request data from Meteorologia Aeronautica Civil for Columbian stations.

method = 'POST'
class Nam(avwx.service.scrape.StationScrape):
356class Nam(StationScrape):
357    """Request data from NorthAviMet for North Atlantic and Nordic countries."""
358
359    _url = "https://www.northavimet.com/NamConWS/rest/opmet/command/0/"
360
361    def _make_url(self, station: str) -> tuple[str, dict]:
362        """Return a formatted URL and empty parameters."""
363        return self._url + station, {}
364
365    def _extract(self, raw: str, station: str) -> str:
366        """Extract the reports from HTML response."""
367        starts = [f">{self.report_type.upper()} <", f">{station.upper()}<", "top'>"]
368        report = self._simple_extract(raw, starts, "=")
369        index = report.rfind(">")
370        if index > -1:
371            report = report[index + 1 :]
372        return f"{station} {report.strip()}"

Request data from NorthAviMet for North Atlantic and Nordic countries.

class Olbs(avwx.service.scrape.StationScrape):
313class Olbs(StationScrape):
314    """Request data from India OLBS flight briefing."""
315
316    # _url = "https://olbs.amsschennai.gov.in/nsweb/FlightBriefing/showopmetquery.php"
317    # method = "POST"
318
319    # Temp redirect
320    _url = "https://avbrief3.el.r.appspot.com/"
321
322    def _make_url(self, station: str) -> tuple[str, dict]:
323        """Return a formatted URL and empty parameters."""
324        return self._url, {"icao": station}
325
326    def _post_data(self, station: str) -> dict:
327        """Return the POST form."""
328        # Can set icaos to "V*" to return all results
329        return {"icaos": station, "type": self.report_type}
330
331    @staticmethod
332    def _make_headers() -> dict:
333        """Return request headers."""
334        return {
335            # "Content-Type": "application/x-www-form-urlencoded",
336            # "Accept": "text/html, */*; q=0.01",
337            # "Accept-Language": "en-us",
338            "Accept-Encoding": "gzip, deflate, br",
339            # "Host": "olbs.amsschennai.gov.in",
340            "User-Agent": secrets.choice(_USER_AGENTS),
341            "Connection": "keep-alive",
342            # "Referer": "https://olbs.amsschennai.gov.in/nsweb/FlightBriefing/",
343            # "X-Requested-With": "XMLHttpRequest",
344            "Accept-Language": "en-US,en;q=0.9",
345            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
346            "Referer": "https://avbrief3.el.r.appspot.com/",
347            "Host": "avbrief3.el.r.appspot.com",
348        }
349
350    def _extract(self, raw: str, station: str) -> str:
351        """Extract the reports from HTML response."""
352        # start = raw.find(f"{self.report_type.upper()} {station} ")
353        return self._simple_extract(raw, [f">{self.report_type.upper()}</div>", station], ["=", "<"])

Request data from India OLBS flight briefing.

class NoaaGfs(avwx.service.files.NoaaForecast):
241class NoaaGfs(NoaaForecast):
242    """Request forecast data from NOAA GFS FTP servers."""
243
244    _url = "https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfsmos.{}/mdl_gfs{}.t{}z"
245    _valid_types = ("mav", "mex")
246
247    _cycles: ClassVar[dict[str, tuple[int, ...]]] = {"mav": (0, 6, 12, 18), "mex": (0, 12)}
248
249    @property
250    def _urls(self) -> Iterator[str]:
251        """Iterate through update cycles no older than two days."""
252        warnings.warn(
253            "GFS fetch has been deprecated due to NOAA retiring the format. Migrate to NBM for similar data",
254            DeprecationWarning,
255            stacklevel=2,
256        )
257        now = dt.datetime.now(tz=dt.timezone.utc)
258        date = dt.datetime.now(tz=dt.timezone.utc)
259        cutoff = date - dt.timedelta(days=1)
260        while date > cutoff:
261            for cycle in reversed(self._cycles[self.report_type]):
262                date = date.replace(hour=cycle)
263                if date > now:
264                    continue
265                timestamp = date.strftime(r"%Y%m%d")
266                hour = str(date.hour).zfill(2)
267                yield self._url.format(timestamp, self.report_type, hour)
268            date -= dt.timedelta(hours=1)
269
270    def _index_target(self, station: str) -> tuple[str, str]:
271        return f"{station}   GFS", f"{self.report_type.upper()} GUIDANCE"

Request forecast data from NOAA GFS FTP servers.

class NoaaNbm(avwx.service.files.NoaaForecast):
220class NoaaNbm(NoaaForecast):
221    """Request forecast data from NOAA NBM FTP servers."""
222
223    _url = "https://nomads.ncep.noaa.gov/pub/data/nccf/com/blend/prod/blend.{}/{}/text/blend_{}tx.t{}z"
224    _valid_types = ("nbh", "nbs", "nbe", "nbx")
225
226    @property
227    def _urls(self) -> Iterator[str]:
228        """Iterate through hourly updates no older than two days."""
229        date = dt.datetime.now(tz=dt.timezone.utc)
230        cutoff = date - dt.timedelta(days=1)
231        while date > cutoff:
232            timestamp = date.strftime(r"%Y%m%d")
233            hour = str(date.hour).zfill(2)
234            yield self._url.format(timestamp, hour, self.report_type, hour)
235            date -= dt.timedelta(hours=1)
236
237    def _index_target(self, station: str) -> tuple[str, str]:
238        return f"{station}   ", f"{self.report_type.upper()} GUIDANCE"

Request forecast data from NOAA NBM FTP servers.

class Service:
38class Service:
39    """Base Service class for fetching reports."""
40
41    report_type: str
42    _url: ClassVar[str] = ""
43    _valid_types: ClassVar[tuple[str, ...]] = ()
44
45    def __init__(self, report_type: str):
46        if self._valid_types and report_type not in self._valid_types:
47            msg = f"'{report_type}' is not a valid report type for {self.__class__.__name__}. Expected {self._valid_types}"
48            raise ValueError(msg)
49        self.report_type = report_type
50
51    @property
52    def root(self) -> str | None:
53        """Return the service's root URL."""
54        if self._url is None:
55            return None
56        url = self._url[self._url.find("//") + 2 :]
57        return url[: url.find("/")]

Base Service class for fetching reports.

Service(report_type: str)
45    def __init__(self, report_type: str):
46        if self._valid_types and report_type not in self._valid_types:
47            msg = f"'{report_type}' is not a valid report type for {self.__class__.__name__}. Expected {self._valid_types}"
48            raise ValueError(msg)
49        self.report_type = report_type
report_type: str
root: str | None
51    @property
52    def root(self) -> str | None:
53        """Return the service's root URL."""
54        if self._url is None:
55            return None
56        url = self._url[self._url.find("//") + 2 :]
57        return url[: url.find("/")]

Return the service's root URL.