Logo 2x

Researchers

Lumen Tools for Researchers

An Introduction to the Lumen Project

Contents:

What content can I find in the database, and where does it come from?

Since its founding in 2001, the Lumen project (formerly the Chilling Effects Clearinghouse) has collected almost 4 million requests to remove material from the World Wide Web. Today, this archive is an indispensable resource for anyone seeking to understand the global ecosystem of requests to remove content from the Internet. The complaints, indexed by topic and stored in our searchable database, include DMCA takedown notices submitted to the database by the individual senders or recipients, as well as notices received by Internet providers and hosts such as Google, Twitter, Reddit, Wikipedia, Wordpress, and others. Aggregating all of these different requests to remove material facilitates the research, study and mapping of the Internet's removal request landscape. Further, it allows members of the public to see the origin and nature of content removals, and make their own evaluations of them.

Where do the notices come from?

Notice submissions to the Lumen database generally come from two types of sources.

Individual users can submit requests to remove material they have sent or received, including cease and desist letters and other takedown notices they have received or sent, including, but by no means limited to, DMCA notices

Businesses that receive notices (like Google, Twitter, Wordpress, Reddit and others) have partnered with Lumen to automatically send us all of the removal requests they receive. For more information about these bulk notice submitters' submissions to Lumen, please refer to each company's own website and information pages.

A notice contains "[redacted]" – what is missing?

Lumen staff make a good faith effort to review and redact any potentially sensitive notices that the project receives in order to remove sensitive or personal information from the text of notices. Such information might include phone numbers, email addresses, or allegedly defamatory content. Further, an individual or company submitting a notice directly to the Lumen database may have decided not to share with Lumen, or to keep private, certain pieces of information in the notice.

Please note that for DMCA notices, Lumen does not redact the name of the rightsholder making the request or the URL(s) of the material complained of. Without the location of the complained-of material and the complainant, the notices are meaningless from a public transparency or research perspective, to say nothing of offering no insight as to possible misuse of takedown notices as a vehicle for censorship.

Who is Lumen for?

Lumen is designed for use by lay Internet users curious about a notice they may have encountered, as well Internet and legal researchers studying larger trends about free expression and content removal online. If you have further ideas about how to use the database, or suggestions about this FAQ, email us at team@lumendatabase.org with the subject line "A Suggestion for Researcher FAQs."

How does it work?

We are excited that you're interested in conducting research using our database of cease and desist notices, and pleased to be able to offer you a powerful new user interface. Most users will find that the web interface will suffice for browsing and discovery within the database. However, for those that need to access larger swaths of data for use or reuse in various applications, we offer our new API. Read on for further information.


BASIC FACTS ABOUT THE DATABASE

Contents

API Documentation

The documentation for the Lumen API can be found here.

Formatting

When a query or request is submitted to the database, the system will return a response with a list of JSON-encoded attributes. Learn more about JSON (JavaScript Object/Open Notation) here. This format is designed to be “machine readable,” and not necessarily useful to a human reader in its raw form. However, there are many tools for rendering JSON output into a friendlier form, and we recommend finding one that works for you.

Example JSON Request:

curl http://lumendatabase.org/notices/1.json

Example Successful JSON Output:

{
  "dmca":{
    "id":1,
    "title":"Lion King on YouTube",
    "body":null,
    "date_sent":"2013-06-04T19:23:12Z",
    "date_received":"2013-06-05T20:31:44Z",
    "topics":[
      "Anticircumvention (DMCA)",
      "Bookmarks",
      "Lumen"
    ],
    "tags": [
      "tag_1",
      "tag_2"
    ],
    "jurisdictions": [
      "US",
      "CA"
    ],
    "action_taken": "Partial",
    "sender_name": "Joe Lawyer",
    "recipient_name": "Google, Inc.",
    "works": [
      {
        "description": "Lion King Video",
        "copyrighted_urls": [
          { "url": "http://www.example.com/lion_king.mp4" },
          { "url": "http://www.example.com/lion_king.mov" }
        ],
        "infringing_urls": [
          { "url": "http://www.example.com/infringing1" },
          { "url": "http://www.example.com/infringing2" },
          { "url": "http://www.example.com/infringing3" }
        ]
      }
    ]
  }
}

Understanding dates - Unix Timestamps

The Lumen database accepts dates in a variety of formats but always outputs dates in Unix Time, which is the number of seconds elapsed since the beginning of the Unix epoch. This can be quite confusing at first, and we recommend using a Unix Timestamp conversion tool (like this one here) to transform these raw date outputs into something a human can understand.

Searching the Database

Most users will find that the web interface will suffice for browsing and discovery within the database. However, for those that need to access larger swaths of data or create automated processes to digest data trends, we offer our new API.

Searching the database, whether through the web interface or with the API, is done via full-text search. The default search is to search all possible notice fields and facets. Searches can also refined based on specific slices of the database or on specific facets of the data. See the documentation for the applicable notice parameters and metadata.


QUERYING THE DATABASE WITH THE API

Contents

Getting an API Key

An authentication key is needed in order to query the database at will via the API. Contact the Lumen staff at team@lumendatabase.org to be provided with one. API queries to the database submitted without a token will be capped at the first 25 results, and at 5 requests per day.

Basic search from the command line

To query the database, use your preferred tools for HTTP "get" requests. There are a number of options available, so pick one depending on your research needs.
Examples include:

  • Curl - a command line program for Mac, iOS and BSD operating system computers, but not for Windows. In order to use curl commands on Windows, a separate tool such as CygWin or Putty is needed.
  • wget - dumps the results of the "get" request to a file.

Example search query for Batman where <parameter> is the database field or facet that is the object of the search.


curl -H "Accept: application/json" -H "Content-type: application/json" 'https://www.lumendatabase.org/notices/search?<parameter>=batman'

Here’s a search query for star where term is the parameter.


curl -H "Accept: application/json" -H "Content-type: application/json" 'https://www.lumendatabase.org/notices/search?term=star'

Searches can also combine multiple parameters when linked with an ampersand. Below, the query combines a search for star where term is the parameter, where batman is the sender_name, and date_received falls between RANGE1..RANGE2


curl -H "Accept: application/json" -H "Content-type: application/json" 'https://www.lumendatabase.org/notices/search?term=star&sender_name=batman&date_received=_facet=RANGE1..RANGE2'

Running these search queries through the API will allow you to search for some period of time, as well as download search results for use and reuse in applications. A complete list of searchable parameters can be found here.

Requesting a List of Topics

The database classifies notices into one or more topics, more of which may be added over time. Certain topics are categorized as subtopics of a larger, comprehensive root topic. For example, like “DMCA,” “fair use,” and “anti-circumvention” all fall under “Copyright.” Each topic has a unique numerical ID in the database. To request a list of topics, use the following command.


 curl https://www.lumendatabase.org/topics.json 

This command will return results with three pieces of information: 1) the topic's unique ID number, 2) the name of the topic, and 3) either the ID number of the parent topic or null if the topic is a root topic.

id integer The unique ID used for the topic_ids array during notice creation
name string The topic name
parent_id integer The parent topic_id of this topic, or "null" if this is a root topic.

Searching the notices

On the web interface, above a certain number of hits your search results will be paginated. By default, results are sorted by descending relevance. Full-text search results contain the same data as an individually-requested notices, with the addition of a score field that articulates the result relevance to the query term; higher numbers are more relevant. Terms are joined with an 'OR' by default.

Downloading Results in Bulk

In order to better manage its resources, Lumen limits requests to its API. For those interested in unlimited access to the database through the API, please see "Getting an API key."

API Terms of Use

The terms of use for the API are available by clicking here.