Sat, 09 Jan 2021 19:41:20 +0200
Merge to close wrong branch.
README.rst | file | annotate | diff | comparison | revisions | |
scc_access/scc_access.py | file | annotate | diff | comparison | revisions | |
settings_sample.yaml | file | annotate | diff | comparison | revisions |
--- a/.hgignore Sat Jan 09 17:29:41 2021 +0200 +++ b/.hgignore Sat Jan 09 19:41:20 2021 +0200 @@ -1,6 +1,8 @@ -syntax: glob -*.rst~ -*.py~ -*.pyc -re:^settings\.py$ -re:^\.idea/ +syntax: glob +*.rst~ +*.py~ +*.pyc +re:^settings\.py$ +re:^\.idea/ +scc_access.egg-info/ +re:^\.pytest_cache/ \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/.hgtags Sat Jan 09 19:41:20 2021 +0200 @@ -0,0 +1,5 @@ +609a3f4b3c27f2179a4dacf81ee8f04c9e1a33d4 0.7.0 +b51ba2647b41f0ec66b297e9a4fe7eaaff46a431 0.7.1 +1b6786e9865db8b2c451fa1afb5ac2fe052a8b85 0.8.0 +8acea12976c4af6c71ac3c224be530a1cbde1a21 0.8.1 +9446b979264bd3e04a8e74133a0eabb4d24fcfc1 0.9.0
--- a/CHANGELOG.rst Sat Jan 09 17:29:41 2021 +0200 +++ b/CHANGELOG.rst Sat Jan 09 19:41:20 2021 +0200 @@ -1,5 +1,27 @@ Changelog ========= +0.9.0 - 2021-01-09 +------------------ +* Added force_upload option +* Removed the process subcommand, added the --process option to the `upload_file` subcommand. +* Homogenised download directory names. + +0.8.1 - 2019-12-19 +------------------ +* Correct handling of ancillary file full paths (thanks to Marc-Antoine Drouin) + +0.8.0 - 2019-12-19 +------------------ +* Check if ancillary file is already in the SCC DB before uploading. + +0.7.1 - 2019-12-05 +------------------ +* Fixed handling of both old- and new-style measurement ids. + +0.7.0 - 2019-01-11 +------------------ +* Download method for HiRElPP, cloudmask, and other new datasets. +* Since 0.6.2: Restructuring of input arguments, ancillary file upload, code improvements. 0.6.2 - 2018-01-10 ------------------
--- a/README.rst Sat Jan 09 17:29:41 2021 +0200 +++ b/README.rst Sat Jan 09 19:41:20 2021 +0200 @@ -19,30 +19,37 @@ Any suggestions for improvements and new features are more than welcome. - Installation ------------ The easiest way to install this module is from the python package index using pip:: - pip install hg+https://repositories.imaa.cnr.it/public/scc_access/#egg=scc-access + pip install hg+https://repositories.imaa.cnr.it/public/scc_access#egg=scc-access + +or, if you don't have mercurial on your system:: + + pip install https://repositories.imaa.cnr.it/public/scc_access/archive/tip.zip You can also use the script by cloning this mercurial repository. - Settings -------- -You will need to provide some user-defined settings in a .yaml format. You -can rename the config_sample.yaml file to e.g. config.yaml and follow the instructions -inside the file. +The required user-defined settings need to be specified in a .yaml file. -Specifically, you will need to: +The following parameters should be specified:: -1. Change the BASIC_LOGIN and DJANGO_LOGIN to your credentials. -2. Change the OUTPUT_DIR to the location were the results will be stored. + basic_credentials: ['username', 'password'] # The HTTP user name and password that is needed to access the SCC site. + website_credentials: ['username', 'password'] # The user name and password that is needed to log in to the SCC site. + output_dir: /path/to/files/scc_output/ # The directory to download the files + base_url: https://scc.imaa.cnr.it/ # SCC base URL. Normally you shouldn't need to change that. + -Please not that it's not a good idea to store your stations management credentials in the settings -file. The standard user has "Station Management" privileges and if the credentials +The repository includes a `settings_sample.yaml` file that you can use as a starting point. Rename the file, e.g. to +`settings.yaml` and input the required parameters. If you don't want to specify the file path every time +your run the `scc_access` script, you can name the file `.scc_access.yaml` and place it in your home directory. + +Please note that it's not a good idea to store your stations management credentials in the settings +file. The standard user has "Station Management" privileges and, if the credentials are stolen, someone could change/delete the stations settings from the SCC database. For this, it is better to use a user account with minimum access settings, i.e. that can only upload files and measurements. @@ -53,28 +60,85 @@ You can upload a file specifying the username and the system id:: - scc_access ./config.yaml ./20110101po01.nc 125 + scc_access upload-file 20110101po01.nc 125 If you want to wait for the processing to finish and download the resulting files -you need to define the -p flag:: +you can use add the `-p` or `--process` flag:: - scc_access ./config.yaml ./20110101po01.nc 125 -p + scc_access upload-file 20110101po01.nc 125 -p -If the provieded measurement ID is already registerd on the SCC, the upload will be rejected. You can -instruct the script to first delete the existing measurement using the --force_upload flag:: +The two command above assume that you have placed your setting file in the default location. You can specify a +custom location using the -c flag:: - scc_access ./config.yaml ./20110101po01.nc 125 --force_upload -p + scc_access -c ./path/to/settings.yaml upload-file 20110101po01.nc 125 + +By default, the SCC will reject an uploaded file, if the specified measurement id already exists on the server. You +can instruct the script to delete any existing measurement before uploading using the `--force_upload` flag:: -You can restart the processing chain on a particular measurements using either the --rerun-all or ---rerun-elpp options and specifying an existing measurement ID. E.g:: + scc_access upload-file 20110101po01.nc 125 -p --force_upload - scc_access ./config.yaml --rerun-elpp 20110101po02 +If you want to delete an existing measurement id from the database use the `delete` +command and the measurement id:: + + scc_access delete 20110101po01 -If you want to delete an existing measurement from the database use the --delete option and -the measurement id:: - - scc_access --delete 20110101po01 +You can list available measurements with the `list` command:: + + scc_access list + +.. note:: + The `list` command needs to be updated. Cross-check the results before using them. For more information on the syntax type:: scc_access -h + +This will produce the following help text:: + + usage: scc_access [-h] [-d] [-s] [-c CONFIG] + {delete,rerun-all,rerun-elpp,upload-file,list,download} ... + + positional arguments: + {delete,rerun-all,rerun-elpp,upload-file,list,download} + delete Deletes a measurement. + rerun-all Rerun all processing steps for the provided + measurement IDs. + rerun-elpp Rerun low-resolution processing steps for the provided + measurement ID. + upload-file Submit a file and, optionally, download the output + products. + list List measurements registered on the SCC. + download Download selected measurements. + + optional arguments: + -h, --help show this help message and exit + -d, --debug Print debugging information. + -s, --silent Show only warning and error messages. + -c CONFIG, --config CONFIG + Path to the config file. + +You can find out more information about each command e.g. like this:: + + scc_access upload-file -h + +In this case, the help text will give more details about the `upload-file` command:: + + usage: scc_access upload-file [-h] [-p] [--force_upload] + [--radiosounding RADIOSOUNDING] + [--overlap OVERLAP] [--lidarratio LIDARRATIO] + filename system + + positional arguments: + filename Measurement file name or path. + system Processing system id. + + optional arguments: + -h, --help show this help message and exit + -p, --process Wait for the processing results. + --force_upload If measurement ID exists on SCC, delete before + uploading. + --radiosounding RADIOSOUNDING + Radiosounding file name or path + --overlap OVERLAP Overlap file name or path + --lidarratio LIDARRATIO + Lidar ratio file name or path
--- a/requirements.txt Sat Jan 09 17:29:41 2021 +0200 +++ b/requirements.txt Sat Jan 09 19:41:20 2021 +0200 @@ -1,1 +1,1 @@ -requests +. \ No newline at end of file
--- a/scc_access/__init__.py Sat Jan 09 17:29:41 2021 +0200 +++ b/scc_access/__init__.py Sat Jan 09 19:41:20 2021 +0200 @@ -1,1 +1,1 @@ -__version__ = "0.6.2" \ No newline at end of file +__version__ = "0.9.0" \ No newline at end of file
--- a/scc_access/scc_access.py Sat Jan 09 17:29:41 2021 +0200 +++ b/scc_access/scc_access.py Sat Jan 09 19:41:20 2021 +0200 @@ -2,104 +2,111 @@ import requests -# Python 2 and 3 support try: import urllib.parse as urlparse # Python 3 except ImportError: - from urlparse import urlparse # Python 2 + import urlparse # Python 2 import argparse +import datetime +import logging import os import re +from io import BytesIO + import time -from io import BytesIO + from zipfile import ZipFile -import datetime -import logging + import yaml import netCDF4 as netcdf requests.packages.urllib3.disable_warnings() - logger = logging.getLogger(__name__) # The regex to find the measurement id from the measurement page # This should be read from the uploaded file, but would require an extra NetCDF module. -regex = "<h3>Measurement (?P<measurement_id>.{12}) <small>" +regex = "<h3>Measurement (?P<measurement_id>.{12,15}) <small>" # {12, 15} to handle both old- and new-style measurement ids. class SCC: - """ A simple class that will attempt to upload a file on the SCC server. + """A simple class that will attempt to upload a file on the SCC server. The uploading is done by simulating a normal browser session. In the current - version no check is performed, and no feedback is given if the upload - was successful. If everything is setup correctly, it will work. + version no check is performed, and no feedback is given if the upload + was successful. If everything is setup correctly, it will work. """ def __init__(self, auth, output_dir, base_url): + self.auth = auth self.output_dir = output_dir self.base_url = base_url self.session = requests.Session() + self.session.auth = auth + self.session.verify = False - # Construct the absolute URLs self.login_url = urlparse.urljoin(self.base_url, 'accounts/login/') + self.logout_url = urlparse.urljoin(self.base_url, 'accounts/logout/') + self.list_measurements_url = urlparse.urljoin(self.base_url, 'data_processing/measurements/') + self.upload_url = urlparse.urljoin(self.base_url, 'data_processing/measurements/quick/') self.download_hirelpp_pattern = urlparse.urljoin(self.base_url, - 'data_processing/measurements/{0}/download-hirelpp/') + 'data_processing/measurements/{0}/download-hirelpp/') self.download_cloudmask_pattern = urlparse.urljoin(self.base_url, - 'data_processing/measurements/{0}/download-cloudmask/') + 'data_processing/measurements/{0}/download-cloudmask/') + self.download_elpp_pattern = urlparse.urljoin(self.base_url, - 'data_processing/measurements/{0}/download-preprocessed/') + 'data_processing/measurements/{0}/download-preprocessed/') self.download_elda_pattern = urlparse.urljoin(self.base_url, - 'data_processing/measurements/{0}/download-optical/') - self.download_plot_pattern = urlparse.urljoin(self.base_url, - 'data_processing/measurements/{0}/download-plots/') + 'data_processing/measurements/{0}/download-optical/') + self.download_plots_pattern = urlparse.urljoin(self.base_url, + 'data_processing/measurements/{0}/download-plots/') self.download_elic_pattern = urlparse.urljoin(self.base_url, - 'data_processing/measurements/{0}/download-elic/') + 'data_processing/measurements/{0}/download-elic/') self.delete_measurement_pattern = urlparse.urljoin(self.base_url, 'admin/database/measurements/{0}/delete/') + self.api_base_url = urlparse.urljoin(self.base_url, 'api/v1/') - - self.login_credentials = None + self.api_measurement_pattern = urlparse.urljoin(self.api_base_url, 'measurements/{0}/') + self.api_measurements_url = urlparse.urljoin(self.api_base_url, 'measurements') + self.api_sounding_search_pattern = urlparse.urljoin(self.api_base_url, 'sounding_files/?filename={0}') + self.api_lidarratio_search_pattern = urlparse.urljoin(self.api_base_url, 'lidarratio_files/?filename={0}') + self.api_overlap_search_pattern = urlparse.urljoin(self.api_base_url, 'overlap_files/?filename={0}') def login(self, credentials): - """ Login the the website. """ + """ Login to SCC. """ logger.debug("Attempting to login to SCC, username %s." % credentials[0]) - self.login_credentials = {'username': credentials[0], - 'password': credentials[1]} + login_credentials = {'username': credentials[0], + 'password': credentials[1]} logger.debug("Accessing login page at %s." % self.login_url) # Get upload form - login_page = self.session.get(self.login_url, auth=self.auth, verify=False) + login_page = self.session.get(self.login_url) - if login_page.status_code != 200: - logger.error('Could not access login pages. Status code %s' % login_page.status_code) - sys.exit(1) + if not login_page.ok: + raise self.PageNotAccessibleError('Could not access login pages. Status code %s' % login_page.status_code) - logger.debug("Submiting credentials.") - + logger.debug("Submitting credentials.") # Submit the login data login_submit = self.session.post(self.login_url, - data=self.login_credentials, + data=login_credentials, headers={'X-CSRFToken': login_page.cookies['csrftoken'], - 'referer': self.login_url}, - verify=False, - auth=self.auth) + 'referer': self.login_url}) return login_submit def logout(self): - pass + """ Logout from SCC """ + return self.session.get(self.logout_url, stream=True) - def upload_file(self, filename, system_id, force_upload, delete_related): - """ Upload a filename for processing with a specific system. If the + def upload_file(self, filename, system_id, force_upload, delete_related, rs_filename=None, ov_filename=None, lr_filename=None): + """ Upload a filename for processing with a specific system. If the upload is successful, it returns the measurement id. """ - measurement_id = self.measurement_id_from_file(filename) logger.debug('Checking if a measurement with the same id already exists on the SCC server.') - existing_measurement = self.get_measurement(measurement_id) + existing_measurement, _ = self.get_measurement(measurement_id) if existing_measurement: if force_upload: @@ -113,25 +120,47 @@ sys.exit(1) # Get submit page - upload_page = self.session.get(self.upload_url, - auth=self.auth, - verify=False) + upload_page = self.session.get(self.upload_url) # Submit the data upload_data = {'system': system_id} files = {'data': open(filename, 'rb')} + if rs_filename is not None: + ancillary_file, _ = self.get_ancillary(rs_filename, 'sounding') + + if ancillary_file.already_on_scc: + logger.warning("Sounding file {0.filename} already on the SCC with id {0.id}. Ignoring it.".format(ancillary_file)) + else: + logger.debug('Adding sounding file %s' % rs_filename) + files['sounding_file'] = open(rs_filename, 'rb') + + if ov_filename is not None: + ancillary_file, _ = self.get_ancillary(ov_filename, 'overlap') + + if ancillary_file.already_on_scc: + logger.warning("Overlap file {0.filename} already on the SCC with id {0.id}. Ignoring it.".format(ancillary_file)) + else: + logger.debug('Adding overlap file %s' % ov_filename) + files['overlap_file'] = open(ov_filename, 'rb') + + if lr_filename is not None: + ancillary_file, _ = self.get_ancillary(lr_filename, 'lidarratio') + + if ancillary_file.already_on_scc: + logger.warning( + "Lidar ratio file {0.filename} already on the SCC with id {0.id}. Ignoring it.".format(ancillary_file)) + else: + logger.debug('Adding lidar ratio file %s' % lr_filename) + files['lidar_ratio_file'] = open(lr_filename, 'rb') + logger.info("Uploading of file %s started." % filename) - logger.debug("URL: {0}, data: {1}, 'X-CSRFToken': {2}".format(self.upload_url, - upload_data, - upload_page.cookies['csrftoken'])) + upload_submit = self.session.post(self.upload_url, data=upload_data, files=files, headers={'X-CSRFToken': upload_page.cookies['csrftoken'], - 'referer': self.upload_url, }, - verify=False, - auth=self.auth) + 'referer': self.upload_url}) if upload_submit.status_code != 200: logger.warning("Connection error. Status code: %s" % upload_submit.status_code) @@ -140,7 +169,7 @@ # Check if there was a redirect to a new page. if upload_submit.url == self.upload_url: measurement_id = False - logger.error("Uploaded file rejected! Try to upload manually to see the error.") + logger.error("Uploaded file(s) rejected! Try to upload manually to see the error.") else: measurement_id = re.findall(regex, upload_submit.text)[0] logger.info("Successfully uploaded measurement with id %s." % measurement_id) @@ -168,13 +197,15 @@ def download_files(self, measurement_id, subdir, download_url): """ Downloads some files from the download_url to the specified - subdir. This method is used to download preprocessed file, optical + subdir. This method is used to download preprocessed file, optical files etc. """ + # TODO: Make downloading more robust (e.g. in case that files do not exist on server). # Get the file - request = self.session.get(download_url, auth=self.auth, - verify=False, - stream=True) + request = self.session.get(download_url, stream=True) + + if not request.ok: + raise Exception("Could not download files for measurement '%s'" % measurement_id) # Create the dir if it does not exist local_dir = os.path.join(self.output_dir, measurement_id, subdir) @@ -204,7 +235,7 @@ # Construct the download url download_url = self.download_hirelpp_pattern.format(measurement_id) try: - self.download_files(measurement_id, 'scc_hirelpp', download_url) + self.download_files(measurement_id, 'hirelpp', download_url) except Exception as e: logger.error("Could not download HiRElPP files. Error message: {}".format(e)) logger.debug('Download exception:', exc_info=True) @@ -214,7 +245,7 @@ # Construct the download url download_url = self.download_cloudmask_pattern.format(measurement_id) try: - self.download_files(measurement_id, 'scc_cloudscreen', download_url) + self.download_files(measurement_id, 'cloudscreen', download_url) except Exception as e: logger.error("Could not download cloudscreen files. Error message: {}".format(e)) logger.debug('Download exception:', exc_info=True) @@ -224,7 +255,7 @@ # Construct the download url download_url = self.download_elpp_pattern.format(measurement_id) try: - self.download_files(measurement_id, 'scc_preprocessed', download_url) + self.download_files(measurement_id, 'elpp', download_url) except Exception as e: logger.error("Could not download ElPP files. Error message: {}".format(e)) logger.debug('Download exception:', exc_info=True) @@ -234,7 +265,7 @@ # Construct the download url download_url = self.download_elda_pattern.format(measurement_id) try: - self.download_files(measurement_id, 'scc_optical', download_url) + self.download_files(measurement_id, 'elda', download_url) except Exception as e: logger.error("Could not download ELDA files. Error message: {}".format(e)) logger.debug('Download exception:', exc_info=True) @@ -242,9 +273,9 @@ def download_plots(self, measurement_id): """ Download profile graphs for the measurement id. """ # Construct the download url - download_url = self.download_plot_pattern.format(measurement_id) + download_url = self.download_plots_pattern.format(measurement_id) try: - self.download_files(measurement_id, 'scc_plots', download_url) + self.download_files(measurement_id, 'elda_plots', download_url) except Exception as e: logger.error("Could not download ELDA plots. Error message: {}".format(e)) logger.debug('Download exception:', exc_info=True) @@ -254,7 +285,7 @@ # Construct the download url download_url = self.download_elic_pattern.format(measurement_id) try: - self.download_files(measurement_id, 'scc_elic', download_url) + self.download_files(measurement_id, 'elic', download_url) except Exception as e: logger.error("Could not download ELIC files. Error message: {}".format(e)) logger.debug('Download exception:', exc_info=True) @@ -264,23 +295,26 @@ # Construct the download url download_url = self.download_elda_pattern.format(measurement_id) # ELDA patter is used for now try: - self.download_files(measurement_id, 'scc_eldec', download_url) + self.download_files(measurement_id, 'eldec', download_url) except Exception as e: logger.error("Could not download EDELC files. Error message: {}".format(e)) logger.debug('Download exception:', exc_info=True) def rerun_elpp(self, measurement_id, monitor=True): - measurement = self.get_measurement(measurement_id) + logger.debug("Started rerun_elpp procedure.") + + logger.debug("Getting measurement %s" % measurement_id) + measurement, status = self.get_measurement(measurement_id) if measurement: - request = self.session.get(measurement.rerun_elpp_url, auth=self.auth, - verify=False, - stream=True) + logger.debug("Attempting to rerun ElPP through %s." % measurement.rerun_all_url) + request = self.session.get(measurement.rerun_elpp_url, stream=True) if request.status_code != 200: logger.error( - "Could not rerun ELPP for %s. Status code: %s" % (measurement_id, request.status_code)) - return + "Could not rerun processing for %s. Status code: %s" % (measurement_id, request.status_code)) + else: + logger.info("Rerun-elpp command submitted successfully for id {}.".format(measurement_id)) if monitor: self.monitor_processing(measurement_id) @@ -289,42 +323,69 @@ logger.debug("Started rerun_all procedure.") logger.debug("Getting measurement %s" % measurement_id) - measurement = self.get_measurement(measurement_id) + measurement, status = self.get_measurement(measurement_id) if measurement: logger.debug("Attempting to rerun all processing through %s." % measurement.rerun_all_url) - request = self.session.get(measurement.rerun_all_url, auth=self.auth, - verify=False, - stream=True) + request = self.session.get(measurement.rerun_all_url, stream=True) if request.status_code != 200: logger.error("Could not rerun pre processing for %s. Status code: %s" % (measurement_id, request.status_code)) - return + else: + logger.info("Rerun-all command submitted successfully for id {}.".format(measurement_id)) if monitor: self.monitor_processing(measurement_id) - def process(self, filename, system_id, force_upload, delete_related): + def process(self, filename, system_id, monitor, force_upload, delete_related, rs_filename=None, lr_filename=None, ov_filename=None): """ Upload a file for processing and wait for the processing to finish. If the processing is successful, it will download all produced files. """ logger.info("--- Processing started on %s. ---" % datetime.datetime.now()) + # Upload file + logger.info("Uploading file.") + measurement_id = self.upload_file(filename, system_id, force_upload, delete_related, + rs_filename=rs_filename, + lr_filename=lr_filename, + ov_filename=ov_filename) - # Upload file - measurement_id = self.upload_file(filename, system_id, force_upload, delete_related) + if measurement_id and monitor: + logger.info("Monitoring processing") + return self.monitor_processing(measurement_id) - measurement = self.monitor_processing(measurement_id) - return measurement + return None def monitor_processing(self, measurement_id): """ Monitor the processing progress of a measurement id""" - measurement = self.get_measurement(measurement_id) + # try to deal with error 404 + error_count = 0 + error_max = 6 + time_sleep = 10 + + # try to wait for measurement to appear in API + measurement = None + logger.info("Looking for measurement %s on the SCC.", measurement_id) + while error_count < error_max: + time.sleep(time_sleep) + measurement, status = self.get_measurement(measurement_id) + if status != 200 and error_count < error_max: + logger.error("Measurement not found. waiting %ds", time_sleep) + error_count += 1 + else: + break + + if error_count == error_max: + logger.critical("Measurement %s doesn't seem to exist", measurement_id) + sys.exit(1) + + logger.info('Measurement %s found.', measurement_id) + if measurement is not None: while measurement.is_running: - logger.info("Measurement is being processed (status: {}, {}, {}, {}, {}, {}). Please wait.".format( + logger.info("Measurement is being processed. status: {}, {}, {}, {}, {}, {}). Please wait.".format( measurement.upload, measurement.hirelpp, measurement.cloudmask, @@ -332,14 +393,9 @@ measurement.elda, measurement.elic)) time.sleep(10) - measurement = self.get_measurement(measurement_id) - logger.info("Measurement processing finished (status: {}, {}, {}, {}, {}, {}). Please wait.".format( - measurement.upload, - measurement.hirelpp, - measurement.cloudmask, - measurement.elpp, - measurement.elda, - measurement.elic)) + measurement, status = self.get_measurement(measurement_id) + + logger.info("Measurement processing finished.") if measurement.hirelpp == 127: logger.info("Downloading HiRElPP files.") self.download_hirelpp(measurement_id) @@ -347,12 +403,12 @@ logger.info("Downloading cloud screening files.") self.download_cloudmask(measurement_id) if measurement.elpp == 127: - logger.info("Downloading ELPP files.") + logger.info("Downloading ElPP files.") self.download_elpp(measurement_id) if measurement.elda == 127: logger.info("Downloading ELDA files.") self.download_elda(measurement_id) - logger.info("Downloading graphs.") + logger.info("Downloading ELDA plots.") self.download_plots(measurement_id) if measurement.elic == 127: logger.info("Downloading ELIC files.") @@ -362,47 +418,45 @@ if measurement.is_calibration: logger.info("Downloading ELDEC files.") self.download_eldec(measurement_id) + logger.info("--- Processing finished. ---") - logger.info("--- Processing finished. ---") return measurement def get_measurement(self, measurement_id): - if measurement_id is None: + if measurement_id is None: # Is this still required? return None - measurement_url = urlparse.urljoin(self.api_base_url, 'measurements/%s/' % measurement_id) + measurement_url = self.api_measurement_pattern.format(measurement_id) + logger.debug("Measurement API URL: %s" % measurement_url) - response = self.session.get(measurement_url, - auth=self.auth, - verify=False) + response = self.session.get(measurement_url) response_dict = None + if response.status_code == 200: response_dict = response.json() - if response.status_code == 404: + elif response.status_code == 404: logger.info("No measurement with id %s found on the SCC." % measurement_id) - elif response.status_code != 200: + else: logger.error('Could not access API. Status code %s.' % response.status_code) - sys.exit(1) - - logger.debug("Response dictionary: {}".format(response_dict)) if response_dict: measurement = Measurement(self.base_url, response_dict) - return measurement else: - return None + measurement = None + + return measurement, response.status_code - def delete_measurement(self, measurement_id, delete_related=False): + def delete_measurement(self, measurement_id, delete_related): """ Deletes a measurement with the provided measurement id. The user - should have the appropriate permissions. - + should have the appropriate permissions. + The procedures is performed directly through the web interface and NOT through the API. """ # Get the measurement object - measurement = self.get_measurement(measurement_id) + measurement, _ = self.get_measurement(measurement_id) # Check that it exists if measurement is None: @@ -410,13 +464,9 @@ return None # Go the the page confirming the deletion - delete_url = self.delete_measurement_pattern.format(measurement.id) - - logger.debug("Delete url: {}".format(delete_url)) + delete_url = self.delete_measurement_pattern.format(measurement_id) - confirm_page = self.session.get(delete_url, - auth=self.auth, - verify=False) + confirm_page = self.session.get(delete_url) # Check that the page opened properly if confirm_page.status_code != 200: @@ -431,27 +481,22 @@ # Delete the measurement delete_page = self.session.post(delete_url, - auth=self.auth, - verify=False, data={'post': 'yes', 'select_delete_related_measurements': delete_related_option}, headers={'X-CSRFToken': confirm_page.cookies['csrftoken'], 'referer': delete_url} ) - if delete_page.status_code != 200: + if not delete_page.ok: logger.warning("Something went wrong. Delete page status: {0}".format( delete_page.status_code)) return None - logger.info("Deleted measurement {0}.".format(measurement_id)) + logger.info("Deleted measurement {0}".format(measurement_id)) return True def available_measurements(self): """ Get a list of available measurement on the SCC. """ - measurement_url = urlparse.urljoin(self.api_base_url, 'measurements') - response = self.session.get(measurement_url, - auth=self.auth, - verify=False) + response = self.session.get(self.api_measurements_url) response_dict = response.json() if response_dict: @@ -459,21 +504,57 @@ measurements = [Measurement(self.base_url, measurement_dict) for measurement_dict in measurement_list] logger.info("Found %s measurements on the SCC." % len(measurements)) else: + logger.warning("No response received from the SCC when asked for available measurements.") measurements = None - logger.warning("No response received from the SCC when asked for available measurements.") return measurements + def list_measurements(self, station=None, system=None, start=None, stop=None, upload_status=None, + processing_status=None, optical_processing=None): + + # TODO: Change this to work through the API + + # Need to set to empty string if not specified, we won't get any results + params = { + "station": station if station is not None else "", + "system": system if system is not None else "", + "stop": stop if stop is not None else "", + "start": start if start is not None else "", + "upload_status": upload_status if upload_status is not None else "", + "preprocessing_status": processing_status if processing_status is not None else "", + "optical_processing_status": optical_processing if optical_processing is not None else "" + } + + response_txt = self.session.get(self.list_measurements_url, params=params).text + tbl_rgx = re.compile(r'<table id="measurements">(.*?)</table>', re.DOTALL) + entry_rgx = re.compile(r'<tr>(.*?)</tr>', re.DOTALL) + measurement_rgx = re.compile( + r'.*?<td><a[^>]*>(\w+)</a>.*?<td>.*?<td>([\w-]+ [\w:]+)</td>.*<td data-order="([-]?\d+),([-]?\d+),([-]?\d+)".*', + re.DOTALL) + matches = tbl_rgx.findall(response_txt) + if len(matches) != 1: + return [] + + ret = [] + for entry in entry_rgx.finditer(matches[0]): + m = measurement_rgx.match(entry.string[entry.start(0):entry.end(0)]) + if m: + name, date, upload, preproc, optical = m.groups() + ret.append( + Measurement(self.base_url, {"id": name, "upload": int(upload), "pre_processing": int(preproc), + "processing": int(optical)})) + + return ret + def measurement_id_for_date(self, t1, call_sign, base_number=0): """ Give the first available measurement id on the SCC for the specific - date. + date. """ date_str = t1.strftime('%Y%m%d') - search_url = urlparse.urljoin(self.api_base_url, 'measurements/?id__startswith=%s' % date_str) + base_id = "%s%s" % (date_str, call_sign) + search_url = urlparse.urljoin(self.api_base_url, 'measurements/?id__startswith=%s' % base_id) - response = self.session.get(search_url, - auth=self.auth, - verify=False) + response = self.session.get(search_url) response_dict = response.json() @@ -481,19 +562,67 @@ if response_dict: measurement_list = response_dict['objects'] + + if len(measurement_list) == 100: + raise ValueError('No available measurement id found.') + existing_ids = [measurement_dict['id'] for measurement_dict in measurement_list] measurement_number = base_number - measurement_id = "%s%s%04i" % (date_str, call_sign, measurement_number) + measurement_id = "%s%02i" % (base_id, measurement_number) while measurement_id in existing_ids: measurement_number = measurement_number + 1 - measurement_id = "%s%s%04i" % (date_str, call_sign, measurement_number) - if measurement_number == 1000: - raise ValueError('No available measurement id found.') + measurement_id = "%s%02i" % (base_id, measurement_number) return measurement_id + def get_ancillary(self, file_path, file_type): + """ + Try to get the ancillary file data from the SCC API. + + The result will always be an API object. If the file does not exist, the .exists property is set to False. + + Parameters + ---------- + file_path : str + Path of the uploaded file. + file_type : str + Type of ancillary file. One of 'sounding', 'overlap', 'lidarratio'. + + Returns + : AncillaryFile + The api object. + """ + assert file_type in ['sounding', 'overlap', 'lidarratio'] + + filename = os.path.basename(file_path) + + if file_type == 'sounding': + file_url = self.api_sounding_search_pattern.format(filename) + elif file_type == 'overlap': + file_url = self.api_overlap_search_pattern.format(filename) + else: + file_url = self.api_lidarratio_search_pattern.format(filename) + + response = self.session.get(file_url) + + if not response.ok: + logger.error('Could not access API. Status code %s.' % response.status_code) + return None, response.status_code + + response_dict = response.json() + object_list = response_dict['objects'] + + logger.debug("Ancillary file JSON: {0}".format(object_list)) + + if object_list: + ancillary_file = AncillaryFile(self.api_base_url, object_list[0]) # Assume only one file is returned + else: + ancillary_file = AncillaryFile(self.api_base_url, None) # Create an empty object + + return ancillary_file, response.status_code + def __enter__(self): return self @@ -501,15 +630,36 @@ logger.debug("Closing SCC connection session.") self.session.close() + class PageNotAccessibleError(RuntimeError): + pass -class Measurement: + +class ApiObject(object): + """ A generic class object. """ + + def __init__(self, base_url, dict_response): + self.base_url = base_url + + if dict_response: + # Add the dictionary key value pairs as object properties + for key, value in dict_response.items(): + # logger.debug('Setting key {0} to value {1}'.format(key, value)) + try: + setattr(self, key, value) + except: + logger.warning('Could not set attribute {0} to value {1}'.format(key, value)) + self.exists = True + else: + self.exists = False + + +class Measurement(ApiObject): """ This class represents the measurement object as returned in the SCC API. """ def __init__(self, base_url, dict_response): - self.base_url = base_url - # Define expected attributes to assist debuggin + # Define expected attributes to assist debugging self.cloudmask = None self.elda = None self.elic = None @@ -526,13 +676,7 @@ self.system = None self.upload = None - if dict_response: - # Add the dictionary key value pairs as object properties - for key, value in dict_response.items(): - setattr(self, key, value) - self.exists = True - else: - self.exists = False + super().__init__(base_url, dict_response) @property def rerun_elda_url(self): @@ -553,77 +697,119 @@ return "Measurement {}".format(self.id) -def upload_file(filename, system_id, force_upload, delete_related, settings): - """ Shortcut function to upload a file to the SCC. """ - logger.info("Uploading file %s, using system %s." % (filename, system_id)) +class AncillaryFile(ApiObject): + """ This class represents the ancilalry file object as returned in the SCC API. + """ + @property + def already_on_scc(self): + if self.exists is False: + return False + + return not self.status == 'missing' + + def __str__(self): + return "%s: %s, %s" % (self.id, + self.filename, + self.status) + + +def process_file(filename, system_id, settings, force_upload, delete_related, + monitor=True, rs_filename=None, lr_filename=None, ov_filename=None): + """ Shortcut function to process a file to the SCC. """ + logger.info("Processing file %s, using system %s" % (filename, system_id)) with SCC(settings['basic_credentials'], settings['output_dir'], settings['base_url']) as scc: scc.login(settings['website_credentials']) - measurement_id = scc.upload_file(filename, system_id, force_upload, delete_related) + measurement = scc.process(filename, system_id, + force_upload=force_upload, + delete_related=delete_related, + monitor=monitor, + rs_filename=rs_filename, + lr_filename=lr_filename, + ov_filename=ov_filename) scc.logout() - - return measurement_id - - -def process_file(filename, system_id, force_upload, delete_related, settings): - """ Shortcut function to process a file to the SCC. """ - logger.info("Processing file %s, using system %s." % (filename, system_id)) - - with SCC(settings['basic_credentials'], settings['output_dir'], settings['base_url']) as scc: - scc.login(settings['website_credentials']) - measurement = scc.process(filename, system_id, force_upload, delete_related) - scc.logout() - return measurement -def delete_measurement(measurement_id, settings, delete_related): - """ Shortcut function to delete a measurement from the SCC. """ - logger.info("Deleting %s." % measurement_id) +def delete_measurements(measurement_ids, delete_related, settings): + """ Shortcut function to delete measurements from the SCC. """ + with SCC(settings['basic_credentials'], settings['output_dir'], settings['base_url']) as scc: + scc.login(settings['website_credentials']) + for m_id in measurement_ids: + logger.info("Deleting %s" % m_id) + scc.delete_measurement(m_id, delete_related) + scc.logout() + + +def rerun_all(measurement_ids, monitor, settings): + """ Shortcut function to rerun measurements from the SCC. """ with SCC(settings['basic_credentials'], settings['output_dir'], settings['base_url']) as scc: scc.login(settings['website_credentials']) - scc.delete_measurement(measurement_id, delete_related) + for m_id in measurement_ids: + logger.info("Rerunning all products for %s" % m_id) + scc.rerun_all(m_id, monitor) scc.logout() -def rerun_all(measurement_id, monitor, settings): +def rerun_processing(measurement_ids, monitor, settings): """ Shortcut function to delete a measurement from the SCC. """ - logger.info("Rerunning all products for %s" % measurement_id) with SCC(settings['basic_credentials'], settings['output_dir'], settings['base_url']) as scc: scc.login(settings['website_credentials']) - scc.rerun_all(measurement_id, monitor) + for m_id in measurement_ids: + logger.info("Rerunning (optical) processing for %s" % m_id) + scc.rerun_elpp(m_id, monitor) + scc.logout() + + +def list_measurements(settings, station=None, system=None, start=None, stop=None, upload_status=None, + preprocessing_status=None, + optical_processing=None): + """List all available measurements""" + with SCC(settings['basic_credentials'], settings['output_dir'], settings['base_url']) as scc: + scc.login(settings['website_credentials']) + ret = scc.list_measurements(station=station, system=system, start=start, stop=stop, upload_status=upload_status, + processing_status=preprocessing_status, optical_processing=optical_processing) + for entry in ret: + print("%s" % entry.id) scc.logout() -def rerun_elpp(measurement_id, monitor, settings): - """ Shortcut function to delete a measurement from the SCC. """ - logger.info("Rerunning (optical) processing for %s" % measurement_id) - +def download_measurements(measurement_ids, download_preproc, download_optical, download_graph, settings): + """Download all measurements for the specified IDs""" with SCC(settings['basic_credentials'], settings['output_dir'], settings['base_url']) as scc: scc.login(settings['website_credentials']) - scc.rerun_elpp(measurement_id, monitor) + for m_id in measurement_ids: + if download_preproc: + logger.info("Downloading preprocessed files for '%s'" % m_id) + scc.download_elpp(m_id) + logger.info("Complete") + if download_optical: + logger.info("Downloading optical files for '%s'" % m_id) + scc.download_elda(m_id) + logger.info("Complete") + if download_graph: + logger.info("Downloading profile graph files for '%s'" % m_id) + scc.download_plots(m_id) + logger.info("Complete") scc.logout() -def import_settings(config_file_path): +def settings_from_path(config_file_path): """ Read the configuration file. The file should be in YAML syntax.""" if not os.path.isfile(config_file_path): - logger.error("Wrong path for configuration file (%s)" % config_file_path) - sys.exit(1) + raise argparse.ArgumentTypeError("Wrong path for configuration file (%s)" % config_file_path) with open(config_file_path) as yaml_file: try: settings = yaml.safe_load(yaml_file) logger.debug("Read settings file(%s)" % config_file_path) - except Exception as e: - logger.error("Could not parse YAML file (%s)" % config_file_path) - logger.debug("Error message: {}".format(e)) - sys.exit(1) + except Exception: + raise argparse.ArgumentTypeError("Could not parse YAML file (%s)" % config_file_path) # YAML limitation: does not read tuples settings['basic_credentials'] = tuple(settings['basic_credentials']) @@ -631,22 +817,133 @@ return settings +# Setup for command specific parsers +def setup_delete(parser): + def delete_from_args(parsed): + delete_measurements(parsed.IDs, + delete_related=False, + settings=parsed.config) + + parser.add_argument("IDs", nargs="+", help="measurement IDs to delete.") + parser.set_defaults(execute=delete_from_args) + + +def setup_rerun_all(parser): + def rerun_all_from_args(parsed): + rerun_all(parsed.IDs, parsed.process, parsed.config) + + parser.add_argument("IDs", nargs="+", help="Measurement IDs to rerun.") + parser.add_argument("-p", "--process", help="Wait for the results of the processing.", + action="store_true") + parser.set_defaults(execute=rerun_all_from_args) + + +def setup_rerun_elpp(parser): + def rerun_processing_from_args(parsed): + rerun_processing(parsed.IDs, parsed.process, parsed.config) + + parser.add_argument("IDs", nargs="+", help="Measurement IDs to rerun the processing on.") + parser.add_argument("-p", "--process", help="Wait for the results of the processing.", + action="store_true") + parser.set_defaults(execute=rerun_processing_from_args) + + +def setup_upload_file(parser): + """ Upload but do not monitor processing progress. """ + def upload_file_from_args(parsed): + process_file(parsed.filename, parsed.system, parsed.config, + monitor=parsed.process, + force_upload=parsed.force_upload, + delete_related=False, # For now, use this as default + rs_filename=parsed.radiosounding, + ov_filename=parsed.overlap, + lr_filename=parsed.lidarratio) + + parser.add_argument("filename", help="Measurement file name or path.") + parser.add_argument("system", help="Processing system id.") + parser.add_argument("-p", "--process", help="Wait for the processing results.", + action="store_true") + parser.add_argument("--force_upload", help="If measurement ID exists on SCC, delete before uploading.", + action="store_true") + parser.add_argument("--radiosounding", default=None, help="Radiosounding file name or path") + parser.add_argument("--overlap", default=None, help="Overlap file name or path") + parser.add_argument("--lidarratio", default=None, help="Lidar ratio file name or path") + + parser.set_defaults(execute=upload_file_from_args) + + +def setup_list_measurements(parser): + def list_measurements_from_args(parsed): + # TODO: Fix this + logger.warning("This method needs to be updated. Cross-chceck any results.") + + list_measurements(parsed.config, station=parsed.station, system=parsed.system, start=parsed.start, + stop=parsed.stop, + upload_status=parsed.upload_status, preprocessing_status=parsed.preprocessing_status, + optical_processing=parsed.optical_processing_status) + + def status(arg): + if -127 <= int(arg) <= 127: + return arg + else: + raise argparse.ArgumentTypeError("Status must be between -127 and 127") + + def date(arg): + if re.match(r'\d{4}-\d{2}-\d{2}', arg): + return arg + else: + raise argparse.ArgumentTypeError("Date must be in format 'YYYY-MM-DD'") + + parser.add_argument("--station", help="Filter for only the selected station") + parser.add_argument("--system", help="Filter for only the selected station") + parser.add_argument("--start", help="Filter for only the selected station", type=date) + parser.add_argument("--stop", help="Filter for only the selected station", type=date) + parser.add_argument("--upload-status", help="Filter for only the selected station", type=status) + parser.add_argument("--preprocessing-status", help="Filter for only the selected station", type=status) + parser.add_argument("--optical-processing-status", help="Filter for only the selected station", type=status) + parser.set_defaults(execute=list_measurements_from_args) + + +def setup_download_measurements(parser): + def download_measurements_from_args(parsed): + # TODO: Fix this + logger.warning("This method needs to be updated. Cross-chceck any results.") + + preproc = parsed.download_elpp + optical = parsed.download_elda + graphs = parsed.download_profile_graphs + if not preproc and not graphs: + optical = True + download_measurements(parsed.IDs, preproc, optical, graphs, parsed.config) + + parser.add_argument("IDs", help="Measurement IDs that should be downloaded.", nargs="+") + parser.add_argument("--download-preprocessed", action="store_true", help="Download preprocessed files.") + parser.add_argument("--download-optical", action="store_true", + help="Download optical files (default if no other download is used).") + parser.add_argument("--download-profile-graphs", action="store_true", help="Download profile graph files.") + parser.set_defaults(execute=download_measurements_from_args) + + def main(): # Define the command line arguments. parser = argparse.ArgumentParser() - parser.add_argument("config", help="Path to configuration file") - parser.add_argument("filename", nargs='?', help="Measurement file name or path.", default='') - parser.add_argument("system", nargs='?', help="Processing system id.", default=0) - parser.add_argument("-p", "--process", help="Wait for the results of the processing.", - action="store_true") - parser.add_argument("--delete", help="Measurement ID to delete.") - # parser.add_argument("--delete_related", help= - # "Delete all related measurements. Use only if you know what you are doing!", - # action="store_true") - parser.add_argument("--force_upload", help="If measurement ID exists on SCC, delete before uploading.", - action="store_true") - parser.add_argument("--rerun-all", help="Rerun all processing steps for the provided measurement ID.") - parser.add_argument("--rerun-elpp", help="Rerun low-resolution processing steps for the provided measurement ID.") + subparsers = parser.add_subparsers() + + delete_parser = subparsers.add_parser("delete", help="Deletes a measurement.") + rerun_all_parser = subparsers.add_parser("rerun-all", help="Rerun all processing steps for the provided measurement IDs.") + rerun_processing_parser = subparsers.add_parser("rerun-elpp", + help="Rerun low-resolution processing steps for the provided measurement ID.") + upload_file_parser = subparsers.add_parser("upload-file", help="Submit a file and, optionally, download the output products.") + list_parser = subparsers.add_parser("list", help="List measurements registered on the SCC.") + download_parser = subparsers.add_parser("download", help="Download selected measurements.") + + setup_delete(delete_parser) + setup_rerun_all(rerun_all_parser) + setup_rerun_elpp(rerun_processing_parser) + + setup_upload_file(upload_file_parser) + setup_list_measurements(list_parser) + setup_download_measurements(download_parser) # Verbosity settings from http://stackoverflow.com/a/20663028 parser.add_argument('-d', '--debug', help="Print debugging information.", action="store_const", @@ -656,29 +953,21 @@ dest="loglevel", const=logging.WARNING ) - args = parser.parse_args() + # Setup default config location + home = os.path.expanduser("~") + default_config_location = os.path.abspath(os.path.join(home, ".scc_access.yaml")) + parser.add_argument("-c", "--config", help="Path to the config file.", type=settings_from_path, + default=default_config_location) - # For now, don to allow to delete related measurements - delete_related = False + args = parser.parse_args() # Get the logger with the appropriate level logging.basicConfig(format='%(levelname)s: %(message)s', level=args.loglevel) - settings = import_settings(args.config) + # Dispatch to appropriate function + args.execute(args) + - # If the arguments are OK, try to log-in to SCC and upload. - if args.delete: - # If the delete is provided, do nothing else - delete_measurement(args.delete, settings, delete_related) - elif args.rerun_all: - rerun_all(args.rerun_all, args.process, settings) - elif args.rerun_elpp: - rerun_elpp(args.rerun_elpp, args.process, settings) - else: - if (args.filename == '') or (args.system == 0): - parser.error('Provide a valid filename and system parameters.\nRun with -h for help.\n') - - if args.process: - process_file(args.filename, args.system, args.force_upload, delete_related, settings) - else: - upload_file(args.filename, args.system, args.force_upload, delete_related, settings) +# When running through terminal +if __name__ == '__main__': + main()