#HackFSM

 

This page includes information about the API to the Bancroft Library's digitized Free Speech Movement materials. Use of the API implies agreement with its Terms of Service and Disclaimer, given on #HackFSM's main API page (parent to this page).

The API is backed by an Apache Solr instance; the full Solr API is not described on this page. For example, faceted search is not described here, but is described in the Solr documentation. To learn about ways to query the Solr API not covered on this page, consult the Apache Solr Reference Guide (available as a PDF).

Note that authorization credentials (the app_id and app_key parameters in API request URLs) must be submitted as URL parameters in order to access the API. App IDs and Keys will be distributed to teams at the Hackathon kickoff. Interactive documentation -- a web page in which parameters to the API may be entered, and response documents returned when a request with those parameters is sent to the API -- will be linked from this page by the time the Hackathon kickoff begins.


External References:

  • The FSM Digital Archive API is backed by an instance of Apache Solr. The Apache Solr Reference Guide (v4.7) is available in PDF form from multiple mirror sites listed here. The section on Specifying Terms for the Standard Query Parser may be of particular interest to Hackathon participants, as it gives a much more complete picture of how queries in requests to this API can be formulated.
  • Interactive documentation to the FSM Digital Archive API, courtesy of API Central
  • (#HackFSM Mentor) Raymond Yee's IPython notebook, showing examples that programmatically address the FSM Digital Archive API (for teams implementing in languages other than Python, feel free to use these examples as model algorithms, and to ask mentors for help finding analogous examples and libraries in your language/platform)

On this page:

 

 

Fields retrievable via the FSM Solr API

The fields given in the list below are metadata to items in the FSM Digital Archive. The primary objects they reference are images or texts; texts are marked up with TEI, an XML schema. The primary objects may be accessed via the URLs given in the fsmImageUrl or fsmTeiUrl fields. In addition, some metadata records include reference to the title and URL to archival collection finding aids (inventories) with which the described primary objects (images or TEI-encoded documents) are associated.

Field Name Definitions
id A unique identifier for each record in the Solr index. No semantic meaning should be inferred from the value of this field.
fsmTitle Either the formal title or description of the item
fsmCreator
Name of the person or corporate body respoonsible for the creation of the item
fsmTypeOfResource
Indicates the nature of the material, either still image or text. Sometimes more descriptive such as newspapers or correspondence.
fsmDateCreated
Date the item was created (irregular text format; see notes, below)
fsmNote
Additional descriptive information.
fsmRelatedTitle
The title of the archival collection of which the item is a part. May include the metacollection titles for the FSM digital archive.
fsmIdentifier
The collection call number or storage information for the item.
fsmRelatedIdentifier
Contains identifiers and links to collection information for the item. Three strings may appear: the url to the FSM digital archive website, the call number for the physical collection in the library, and the url to the online collection finding aid (inventory and description) for the collection. Example: http://bancroft.berkeley.edu/FSM/, BANC PIC 2000.067, http://www.oac.cdlib.org/findaid/ark:/13030/tf8f59p055
fsmPhysicalLocation

The name, location, and contact information for the repository where the physical item is located.

Note that this field is populated for only some of the records; however, all items returned by the API have the same physical location: The Bancroft Library / University of California, Berkeley / Berkeley, CA 94720-6000 / Phone: (510) 642-6481 / Fax: (510) 642-7589 / Email: bancref@library.berkeley.edu / URL: http://bancroft.berkeley.edu

fsmImageUrl
These are urls to the image files. May include up to three versions of the files, thumbnail, medium resolution jpeg (~750 pixels longest edge), and higher resolution jpeg (~1500 pixels longest edge). The first url is usually to the smallest file. Indicators in the filename have meaning: _i and _a = thumbnail, _j and _b = medium res, and ,_k and _c = high res. May also include multiple images of the item, as in a back and front cover, or a two page document.
fsmTeiUrl
The url to the tei-encoded text file. TEI is a content-based encoding for text documents; each TEI document contains content specific tagging and a header.

 

Top of page

 

Including Authentication Credentials to obtain API access

The API requires that authentication credentials be submitted with each API request (RESTful APIs are stateless, so there's no 'logged in' session maintained between requests). The app_id and app_key parameters must be included in each URL request to the API; a 403 Forbidden error will be returned if valid values for these parameters are missing from the request.

Here's an example -- the important thing to notice is the inclusion of the two parameters at the end:

https://apis.berkeley.edu/solr/fsm/select?q=Halloran&wt=json&indent=on&app_id=abcdefgh&app_key=12345678901234567890123456789012

 

Top of page

 

Basic Solr URL parameters: query and specify response format

NOTES:

  • The API only responds to the HTTP verb GET: The Bancroft Library is giving access to a read-only collection of materials.
  • The fields described above are metadata. Some of these (cf. fsmImageUrl and fsmTeiUrl) give the physical location of the described material, e.g., photo(s) or TEI-encoded XML documents. Thus, for example, identifying an image of interest and obtaining it requires multiple requests against the API (and because multiple URLs may be included in the fsmImageUrl field, parsing of that field's contents for the desired image among multiple options may also be required).
  • When specifying field names, be aware that they are case-sensitive (query terms, however, are case-insensitive).

 

Request Param Function Notes
q Query

This parameter holds the query statement. A simple query either does not specify a field (full search over all fields), e.g., q=Mario; or specifies a single field, e.g., q=fsmTitle:Mario.

Query terms are NOT case-sensitive. However, query field names ARE case sensitive.

The index returns matches where the search term is in the specified field, e.g., q=fsmTitle:Mario returns records in which Mario occurs anywhere in the title.

Multiple search terms can be included by placing a plus-sign between them, e.g.: q=Searle+Savio -- this example will find instances of Searle or Savio. To query for an exact phrase, enclose it in double-quotes, e.g., q="Mario+Savio".

More complex queries, including queries on multiple fields and wildcards, are described in the Advanced query tips and examples section below.

wt Response format

The value of this param can be any of the following; xml is the default if the parameter is omitted.

  • xml
  • json
  • csv
  • ruby
  • python
  • php

Response formats can be examined by sending a request to the API. A convenient URL might be the one used to generate the attached zip file of response document examples, which gives a small number of responses (simply vary the value of the wt parameter to see different formats; naturally, you must also substitute valid API ID and Key values):

https://apis.berkeley.edu/solr/fsm/select?q=fsmTitle:Gardner&wt=json&indent=on&app_id=abcdefgh&app_key=12345678901234567890123456789012
indent "Pretty printed" response format

If this parameter is not included, or is included and set to the value off -- no extra whitespace will be included in the response to make it easy for human-reading. This is usually an appropriate choice if code is going to be used to parse the API's response document.

If this parameter is included and set to any value other than off, the response will be formatted so that it can be more easily read by humans. This is usually an appropriate choice for those investigating the API manually and inspecting the response document(s) visually.

 

Here's an example. Note that the request queries for records that include the word (name) Halloran; and specifies indented JSON as the response format.

https://apis.berkeley.edu/solr/fsm/select?q=Halloran&wt=json&indent=on&app_id=abcdefgh&app_key=12345678901234567890123456789012

 

Here's another example, in which the only change from the prior example is that the query term is scoped to occurrences of Halloran in the field fsmTitle.

https://apis.berkeley.edu/solr/fsm/select?q=fsmTitle:Halloran&wt=json&indent=on&app_id=abcdefgh&app_key=12345678901234567890123456789012

 

Here's one more example, in which the only change from the prior example is that the query term is scoped to occurrences of Halloran in the field fsmTei, i.e., to occurrence in the text of a document in the collection as opposed to anywhere in a document or metadata (where metadata might describe a photograph of Sgt. Halloran, rather than a document in which his name occurs).

https://apis.berkeley.edu/solr/fsm/select?q=fsmTei:Halloran&wt=json&indent=on&app_id=abcdefgh&app_key=12345678901234567890123456789012

 

Top of page

 

Advanced query tips and examples

 

 

Specify number of results returned

 

Request Param Function Notes
rows Specify number of rows to be returned, e.g. &rows=100 The default number of rows returned is 30.

Top of page | Top of Advanced query tips and examples

 

 

Paging results: get specific rows

 
Request Param Function Notes
start When included, this parameter specifies the offset in the complete result set for the queries where the set of returned documents should begin. (i.e. the first record that appears in the result set is the offset). E.g., &start=125 The default offset (i.e., when this parameter is not included) is zero (0). Result sets are indexed beginning with zero, i.e., the first record in the complete result set is specified as record 0 (zero), the second record is record 1, etc.
rows Specify number of rows to be returned, e.g. &rows=25 The default number of rows returned is 30.

Top of page | Top of Advanced query tips and examples

 

 

 

Query on multiple fields

A set of query terms may include multiple fields, using the operators AND, OR, NOT, + or -. If no operators are used, OR is assumed (default).
 
The + operator requires the search term following the + to exist somewhere in the specified field for a result to be included in the returned set; the - operator excludes results in which the search term following the - appears.'
 
Examples (note that spaces are not escaped below, for readability; in actual URLs, they must be replaced by %20 in every case):
  1. q=fsmTitle:Searle AND fsmCreator:Marcus
  2. q=fsmTitle:Searle AND NOT fsmCreator:Marcus
  3. q=fsmTitle:Searle AND -fsmCreator:Marcus
  4. q=fsmTitle:Searle OR fsmCreator:Marcus
  5. q=fsmTitle:Searle fsmCreator:Marcus

Note that examples 2 & 3 returns the same result set; as do examples 4 & 5.

Top of page | Top of Advanced query tips and examples

 

 

Specify fields to be returned

 
Request Param Function Notes
fl When this parameter is included, only the fields listed will be returned in a result set (empty fields may not be returned, depending on the response format requested). E.g., &fl=fsmTitle,fsmImageUrl The set of fields to be returned can be specified as a comma (or space) separated list of field names.

Top of page | Top of Advanced query tips and examples

 
 

 

Use wildcards in queries

Wildcards can be inserted in query terms at the start, middle, or end of a term. Asterisk (*) is the wildcard character. Considering also that query terms are case-insensitive, the following query field:term expressions all return the same result set (but only because none of the materials reference the name "Justin" in the fsmTitle field!):
  • q=fsmTitle:Dustin
  • q=fsmTitle:Dust*n
  • q=fsmTitle:Dusti*
  • q=fsmTitle:*ustin
  • q=fsmTitle:dustin
  • q=fsmTitle:dust*n
  • q=fsmTitle:dusti*

Sorting results

 
Request Param Function Notes
sort Sort results by Solr-calculated relevancy score. E.g., &sort=score+asc or &sort=score+desc The score pseudo-field is the only field in this API that can be used to sort a result set. The sort direction can be ascending (asc) or descending (desc).
 
 

Distinguishing images from documents

To programmatically differentiate records that describe images from records that describe TEI-encoded XML documents, the API permits queries that exclude records with NULL values in the "unwanted" Url field.
 
That is, to retrieve TEI documents only, one would query for null values in the fsmImageUrl field. To retrieve images only, one would query for null values in the fsmTeiUrl field.
 
NOTE: Please observe the hyphen prepended to the field names in the examples below. The hyphen (minus sign) functions here as a NOT operator. It's worth pointing out that the notation [* TO *] specifies 'any value' in Solr queries; this and much more information about the API's query parser can be found in the Apache Solr Reference Guide (v4.7), available in PDF form from multiple mirror sites listed here -- cf. the section on Specifying Terms for the Standard Query Parser
 
Example that selects for TEI encoded XML documents by excluding null values of fsmImageUrl:
 
https://apis.berkeley.edu/solr/fsm/select?q=-fsmImageUrl:[* TO *]&wt=json&indent=on&app_id=abcdefgh&app_key=12345678901234567890123456789012
 
Example that selects for images by excluding null values of fsmTeiUrl:
 
https://apis.berkeley.edu/solr/fsm/select?q=-fsmTeiUrl:[* TO *]&wt=json&indent=on&app_id=abcdefgh&app_key=12345678901234567890123456789012
 

 

Search vs. retrieval of document content

The field fsmTei exists in the Solr index, but is not listed at the top of this page. That's because we recommend it not be specified for inclusion in sets of fields returned in response to a query (using the fl parameter, described above).
 
This field contains a plain-text version of the the underlying TEI-encoded XML documents described by metadata that references documents (as opposed to images, for which this field is NULL). Requesting that the field be returned will generate large response documents, and may cause your request to time out.
 
However, use of the fsmTei field in a query parameter is a useful way to specify a search through the text of a document only, as opposed to the text of the document plus its metadata fields.
 
To retrieve the TEI-encoded document, request the document via the URL in the fsmTeiUrl field.
 

 

 

Irregular dates in the API's underlying data

Date data in the field fsmDateCreated are irregular (they do not conform to any standard). The following information is provided to help those who wish to programmatically translate user-input dates and date ranges into query terms that will return user-specified records from this collection. The following are not a comprehensive list of date formats in the fsmDateCreated field, but do cover most cases.
  • Most dates are of the form three-letter month followed by a period, a space, a date, a comma, and a four-digit year, e.g.: Dec. 7, 1964
  • Some dates are of the form four-letter month followed by a period, a space, a date, a comma, and a four-digit year, e.g.: Sept. 29, 1964
  • Some dates consist only of a four-digit year, e.g.: 1964
  • Some dates consist of a three-letter month, followed by a hyphen, followed by a two-digit (20th century) year, e.g.: May-70
  • Some dates consist of a three-letter month, followed by a period, a comma, a space, and a four-digit year, e.g.: Apr., 1965
  • Some dates consist of a four-letter month, followed by a period, a space, and a four-digit year, e.g.: Sept. 1970
  • Some dates are given in multiple forms, as in this example (delimiter depends on response document format): February 1970,1970-02
  • Some dates are given as a range of days, e.g.: Oct 1-2, 1964
  • Some dates are given as a range of years, e.g.: 1978-1979
  • Some dates are given as an approximation (where "ca." abbreviates circa), e.g.: ca. 1964
  • Some rows contain non-contiguous date data, e.g.: Oct. 13 and 15, 1964
  • Some rows contain a non-date in this field, e.g., not dated

Top of page | Top of Advanced query tips and examples

 

 

Where can I learn more about querying the API?

The FSM Digital Archive API is backed by an instance of Apache Solr; the full Solr API is not described on this page. For example, faceted search is not described here, but is described in the Solr documentation. The Apache Solr Reference Guide (v4.7) is available in PDF form from multiple mirror sites listed here. The section on Specifying Terms for the Standard Query Parser may be of particular interest to Hackathon participants, as it gives a much more complete picture of how queries in requests to this API can be formulated.

You can use many tools -- most web browsers, curl, Firefox's Poster plugin, etc. -- to query the API. An especially convenient tool for querying the API with parameters described on this page can be found at UC Berkeley's API Central, where we've provided browser-based interactive documentation to the FSM Digital Archive API. 

Top of page | Top of Advanced query tips and examples

 

#HackFSM was a one-time event held in April 2014. The winning site is now available. A white paper on #HackFSM has also been published, with practical information for other libraries that are interested in holding hackathons.