<resource schema="purx">
	<meta name="title">purx Proxy Registry</meta>
	<meta name="description">
		purx lets you register your services without having to run
		a full OAI-PMH endpoint, but still letting you programmatically
		control the VOResrouce XML.
	</meta>
	<meta name="creationDate">2017-03-27T11:08:00Z</meta>
	<meta name="schema-rank">500</meta>
	<meta name="subject">virtual-observatories</meta>

	<meta name="creator">Demleitner, M.</meta>

	<meta name="contentLevel">Research</meta>
	<meta name="type">Registry</meta>

	<meta name="_longdoc" format="rst"><![CDATA[
		What's this?
		============

		A proxy publishing registry – this lets you simply put up Virtual
		Observatory resource records on a plain webserver or generate them
		programmatically and then enroll their URLs at *this* service.
		We will then make sure the `VO Registry`_ will see your records.
		That's something you want because then your service will show up in
		`popular VO clients`_ like TOPCAT or Aladin.

		.. _VO Registry: http://adsabs.harvard.edu/abs/2014arXiv1407.3083D
		.. _popular VO clients: http://ivoa.net/astronomers/applications.html

		How do I use it?
		================

		First, prepare a registry record.  There are various different scenarios:

		* DaCHS users: DaCHS can produce resource records for all your services
		  and tables.  The URL is::
		
		  	<base uri>/getRR/<rdid>/<resourcename>

		  (as in ``.../getRR/__system__/tap/run`` for the TAP service); you
		  can also see a link to that way down on service info pages under
		  “VOResource XML”.
		* Other toolkits: Perhaps your toolkit can produce VOResource already;
		  in particular if you're publishing through TAP, you already have the
		  difficult pieces capabilities and tables, as they are available
		  on their VOSI endpoints (for TAP 1.0, that's the ``.../capabilities``
		  and ``.../tables`` children of your access URL).  You can simply
		  include that XML into the registry records as described below, adjusting
		  the root elements (which can safely be done using regular expression).
		  Toolkit authors: Let us know what to write here.
		* `Write registry records from scratch`_
		
		Unless your toolkit already gives you a URL, make the registry record built
		in this way available under an http(s) URL (http is preferred because
		there's less that can fail).
		
		It is highly recommended to make sure the file is being served with
		last-modified headers and support for HTTP if-modified-since (that's the
		case if you use a plain file on a capable web server like apache).  That
		way, our checks for updates (which we do every 80 ks) will consume almost
		no resources at all.

		The system will work even without last-modified.  You *must*, however,
		manage the updated attribute on your `ri:Resource`, element. purx
		will not update the record unless that date is updated.

		Then submit the URL to the `purx enrollment service`_.  This will
		validate your resource record and generate an IVOA identifier for it
		if all checks out.  It will then send a mail to the address given
		as contact in the resource record with something like an activation
		URL (which is valid for at least 150 ks).  Once that URL is retrieved,
		you should see your record in the common registries within a day or so
		(these first need to hit purx's OAI-PMH face; the Registry is pull in
		this way rather than push).

		You can always check the idea purx has of your service by entering
		your access URL at the `purx status service`_.
		
		.. _purx enrollment service: /purx/q/enroll/custom
		.. _purx status service: /purx/q/urlstatus/form
	

		What Identifier Will I Get?
		===========================

		One of the big advantages of purx is that you don't have to think of
		an authority and claim it.  The downside is that we will assign the
		identifier.  We're trying to be accomodating, though.
		
		Just set the identifier element with an arbitrary (ignored by us)
		authority – ``ivo://ignored/generic/service``, say.  We will then take the
		path part (``generic/service`` in this case; obviously, you should
		pick something descriptive here) and glue that together with the
		authority.  That gives ``ivo://purx/generic/service``, which would be
		your identifier.

		If that identifier is already taken by someone else, we take the longest
		element of the host part of the URL we got the XML from, and stick that
		right behind the authority.  For instance, if we got the document from
		``http://ari.uni-heidelberg.de/doc.xml``, the resulting identifier would be
		``ivo://purx/uni-heidelberg/generic/service``.

		If that still clashes, we try to disambiguate by appending numbers, but
		I'd suspect somebody is trolling us if that happens.

		Also note that the URL-ivoid relationship is fixed once the identifier
		is minted.  Even if you change the identifier at your end, the identifier
		assigned by purx will remain.


		Getting Out of purx
		===================

		Purx will regularly re-retrieve your file.  If that fails or if the
		document becomes invalid, purx will, after a couple of weeks, drop your
		service record (technically, it will henceforth publish a „deleted record”
		for it).  It will send two mails explaining the situation to the contact
		person given in the record that it last saw.

		If you want to immediately get rid of the record, just arrange for
		your webserver to return a 403 Forbidden HTTP status code to purx.

		You can always resuscitate your record by resubmitting it to the `purx
		enrollment service`_ again.  It will receive the ivoid it had before.


		Write Registry Records from Scratch
		===================================

		That's a bit of work, in particular if you want to provide
		table metadata (which is highly recommended).  Before starting
		you should have the following specs handy (in the sense of: searchable;
		don't even try to actually read them):
		
		* `VOResource`_ (basic definitions)
		* `VODataService`_ (definitions of services and table metadata elements)
		* `SimpleDALRegExt`_ (metadata for cone search and S*AP; the equivalent
		  for TAP is TAPRegExt, but you don't need that because your TAP
		  server already spits out ready-made content for that from capabilities).
		
		You can use these to look up explanations for elements you don't understand
		in the sample records.  There's also http://docs.g-vo.org/schemadoc/,
		giving a javadoc-like reference for the schema files used in the VO.

		Now figure out which kind of service (Browser, SCS, TAP, SIAP...) you'd
		like to publish and grab a matching sample record below.  Format it with
		whatever tool you prefer (our recommendation: xmlstarlet) and then just
		edit it, replacing content as appropriate.  Quite a few elements can also
		be deleted if you absolutely cannot find something to put in there.  Plan
		for a couple of rounds of validation (any XSD validator should do, or just
		use purx, though we're not putting any effort in the nice presentation of
		the diagnostics yet).

		Oh, don't sweat the VOSI interfaces that you'll see in the sample records.
		If you don't have them, just remove the corresponding capabilities;
		most of the records below also have „auxiliary” capabilities
		(the ``standardID`` attribute has an ``aux`` in the fragment part).  Unless
		you know that you want such capability declaration, just remove them.

		Here are links to some sample records:

		* A simple browser service:
		  http://dc.zah.uni-heidelberg.de/getRR/dexter/ui/ui
		* A browser service with metadata for the underlying table:
		  http://dc.zah.uni-heidelberg.de/getRR/ucds/ui/ui
		* A standard cone search (SCS) service with a custom browser service:
		  http://dc.zah.uni-heidelberg.de/getRR/amanda/q/cone
		* An image (SIAP) service with a custom browser service:
		  http://dc.zah.uni-heidelberg.de/getRR/lswscans/res/positions/siap
		* A spectral (SSAP) service:
		  http://dc.zah.uni-heidelberg.de/getRR/flashheros/q/ssa
		* A spectral line (SLAP) service:
		  http://dc.zah.uni-heidelberg.de/getRR/toss/q/q
		* A TAP service (but again, don't write capability and tableset by hand,
		  your toolkit can already build these):
		  http://gavo.aip.de/getRR/__system__/tap/run
		* A record for a tutorial (or some other piece of documentation
		  that should be listed on VOTT_:
		  http://dc.zah.uni-heidelberg.de/getRR/tutreg/gavo_addpms/gavo_addpms

		.. _VOResource: http://ivoa.net/Documents/VOResource/
		.. _VODataService: http://ivoa.net/Documents/VODataService/
		.. _SimpleDALRegExt: http://ivoa.net/Documents/SimpleDALRegExt/
		.. _VOTT: https://dc.g-vo.org/VOTT
		
	]]></meta>

	<execute title="re-harvest sources" every="80000" debug="True">
		<job>
			<code>
				enrollment, _ = loadPythonModule(rd.getAbsPath("res/enrollment"))
				enrollment.updateAll()
			</code>
		</job>
	</execute>

	<execute title="dump table" at="0:30">
		<job>
			<code>
				enrollment, _ = loadPythonModule(rd.getAbsPath("res/enrollment"))
				enrollment.dumpIfChanges()
			</code>
		</job>
	</execute>

	<table id="sources" onDisk="True" primary="source_url">
		<meta name="description">A table of source URLs and their metadata.</meta>

		<column name="source_url" type="text"
			ucd="meta.ref.url"
			tablehead="Source URL"
			description="URL to pull the VOResource from"
			displayHint="type=url">
			<property name="anchorText">VOResource XML</property>
		</column>
		<!-- rectimestamp name is reqired by registry.oaiinter; yes,
		we're exploiting case-insensitivity of SQL here and promise to
		never again have upper case letters in column names.  -->
		<column name="rectimestamp" type="timestamp"
			ucd="time.epoch"
			tablehead="Updated"
			description="Date of harvest of xml_source (compare
				against this on OAI-PMH requests)."
			displayHint="type=humanDate"/>
		<column name="upstream_update" type="timestamp"
			ucd="time.epoch"
			tablehead="Upstream Update"
			description="Date updated according to the upstream record (compare
				against this when determining if re-publication is necessary)."/>
		<column name="modification_date" type="double precision"
			ucd="time.epoch" unit="s"
			tablehead="Modified"
			description="Date in upstream Modified header (use this to create
				HTTP if-modified-since headers)"
			displayHint="type=humanDate"/>
		<column name="ivoid" type="text"
			ucd="meta.ref.ivoid"
			tablehead="ivoid"
			description="IVOA identifier allocated"/>
		<column name="status" type="text"
			ucd="meta.code"
			tablehead="Status"
			description="Status of record; one of PENDING, OK, FAILn (n number of
			consecutive failures when trying to retrieve data), DROPPED.">
			<values>
				<option>PENDING</option>
				<option>OK</option>
				<option>FAIL1</option>
				<option>FAIL2</option>
				<option>FAIL3</option>
				<option>FAIL4</option>
				<option>FAIL5</option>
				<option>DELETED</option>
			</values>
		</column>
		<column name="access_code" type="text"
			tablehead="Activation"
			description="Activation code to move record from PENDING to OK."
			verbLevel="40"/>
		<column name="contact_email" type="text"
			tablehead="Contact"
			description="Last known contact e-mail (used when something goes wrong)."
			verbLevel="40"/>
		<column name="title" type="unicode"
			tablehead="Title"
			description="Title of the resource as given in VOResource."/>
		<column name="xml_source" type="unicode"
			tablehead="Source"
			description="Resource record in utf-8 ready for inclusion into
				OAI-PMH"
			verbLevel="40"/>
	</table>

	<table id="public_fields" onDisk="True">
		<meta name="description">The columns of the sources table suitable
			for public consumption"</meta>
		
		<LOOP listItems="source_url title rectimestamp upstream_update
				modification_date ivoid status">
			<events>
				<column original="sources.\item"/>
			</events>
		</LOOP>

		<viewStatement>
			CREATE VIEW \curtable AS (
				SELECT \colNames FROM \schema.sources)
		</viewStatement>
	</table>

	<data id="create" auto="False" recreateAfter="create_view">
		<meta name="shortName">purx OAI</meta>
		<make table="sources">
			<script lang="SQL" type="postCreation" name="ensure unique ivoids">
				ALTER TABLE \qName ADD UNIQUE(ivoid)
			</script>
		</make>
	</data>

	<data id="create_view" auto="False">
		<make table="public_fields"/>
	</data>

	<service id="enroll" allowed="custom,static"
		customPage="res/enrollment">
		<meta name="description">Web service to enroll plain HTTP-published
			VOResource records with purx</meta>
		<property key="staticData">static</property>
		<meta name="_related" title="Status service"
			>\internallink{purx/q/urlstatus/form}</meta>
		<meta name="shortName">purx enrolling</meta>
		<meta name="title">purx Enrollment Web Service</meta>

		<publish render="custom" sets="ivo_managed,local"/>

		<nullCore/>
	</service>

	<service id="urlstatus" allowed="qp,form">
		<meta name="title">purx URL Status</meta>
		<meta name="description">Use this service to inspect what
			purx thinks of a record.</meta>
		<meta name="_related" title="Enrollment service"
			>\internallink{purx/q/enroll/custom}</meta>
		<property name="queryField">source_url</property>

		<dbCore queriedTable="public_fields">
			<condDesc buildFrom="source_url"/>
		</dbCore>

		<outputTable verbLevel="30">
			<outputField name="info_url"
					tablehead="purx Info"
					description="A bookmarkable link to the purx status for this
						resource"
					select="source_url">
				<formatter>
					return T.a(href=base.makeSitePath(
						"/purx/q/urlstatus/qp/"+urllib.parse.quote(data)))["purx Info"]
				</formatter>
			</outputField>
		</outputTable>
	</service>

	<service id="pmh" allowed="pubreg.xml">
		<meta name="title">purx Publishing Registry Proxy</meta>
		<meta name="shortName">purx OAI-PMH</meta>
		<meta name="description">
			This is the OAI-PMH endpoint of the purx publishing registry proxy.
			purx lets you publish VOResource records by just putting XML
			into a web browser.  For details, see http://dc.g-vo.org/PURX.
		</meta>
		<meta name="resType">registry</meta>
		<meta name="maxRecords">1000</meta>
		<meta name="full">false</meta>
		<meta name="managedAuthority">purx</meta>
		<meta name="identifier">ivo://purx/registry</meta>
		<meta name="datetimeUpdated">2019-09-24T09:00:00</meta>

		<publish sets="ivo_managed" render="pubreg.xml"/>

		<customCore module="res/purxoai"/>
	</service>

	<resRec id="auth">
		<meta>
			resType: authority
			identifier: ivo://purx
			creationDate: 2017-08-28T09:00:00Z
			datetimeUpdated:2018-06-01T02:10:00
			title:purx Proxy Authority
			creator: Demleitner, M.
			managingOrg: ivo://org.gavo.dc/org
			subject:virtual-observatories
			referenceURL:http://dc.g-vo.org/PURX
			sets:ivo_managed
			contentLevel: Research
		</meta>
		<meta name="description">
			This authority is used by the purx publishing registry proxy and
			hence contains records from (potentially) a multitude of publishers
			that just put their registry records on a common web server.  Problems
			with actual records served here should be reported to the resources'
			contact addresses first.  Non-responsive contacts should be reported
			to the contact persons of this authority.
		</meta>
	</resRec>

	<regSuite title="purx regressions">
		<regTest title="Mandatory records are listed with ivo_managed">
			<url metadataPrefix="ivo_vor" verb="ListIdentifiers"
				>pmh/pubreg.xml</url>
			<code>
				identifiers = set(e.text for e in self.getXpath("//o:identifier"))
				self.assertTrue("ivo://purx" in identifiers)
				self.assertTrue("ivo://purx/registry" in identifiers)
			</code>
		</regTest>

		<regTest title="purx identify looks plausible">
			<url verb="Identify">pmh/pubreg.xml</url>
			<code>
				self.XPATH_NAMESPACE_MAP["vr"
					] = "http://www.ivoa.net/xml/VOResource/v1.0"
				self.assertXpath("o:Identify/o:description/*/capability["
					"@standardID='ivo://ivoa.net/std/registry']/interface", {
					"{http://www.w3.org/2001/XMLSchema-instance}type": "vg:OAIHTTP",
					"role": "std"})
				self.assertXpath("o:Identify/o:description/*/identifier", {
					None: "ivo://purx/registry"})
				self.assertXpath("o:Identify/o:description/*/managedAuthority", {
					None: "purx"})
			</code>
		</regTest>

		<regTest title="purx response valid">
			<url metadataPrefix="ivo_vor" verb="ListRecords" set="ivo_managed"
				>pmh/pubreg.xml</url>
			<code>
				self.assertValidatesXSD()
			</code>
		</regTest>

		<regTest title="purx enrollment service is up">
			<url>enroll/custom</url>
			<code>
				# can we figure out a stronger test?
				self.assertHasStrings("The URL at which to retrieve")
			</code>
		</regTest>

		<regTest title="purx activation service gives a sensible error message.">
			<url>enroll/custom/confirm/junk-code</url>
			<code>
				self.assertHasStrings("no open transaction was found")
			</code>
		</regTest>
	</regSuite>
</resource>
