Lumen Tools for Researchers






Logo 2x



Researchers






Lumen Tools for Researchers


An Introduction to the Lumen Project



Contents:



  • What is in the database?

  • Where do the notices come from?

  • What is missing?

  • Who is Lumen for?

  • How does it work?

  • API Terms of Use

    What content can I find in the database, and where does it come from?



    Since its founding in 2001, the Lumen project (formerly the Chilling Effects Clearinghouse) has collected almost 4 million requests to remove material from the World Wide Web. Today, this archive is an indispensable resource for anyone seeking to understand the global ecosystem of requests to remove content from the Internet. The complaints, indexed by topic and stored in our searchable database, include DMCA takedown notices submitted to the database by the individual senders or recipients, as well as notices received by Internet providers and hosts such as Google, Twitter, Reddit, Wikipedia, Wordpress, and others. Aggregating all of these different requests to remove material facilitates the research, study and mapping of the Internet's removal request landscape. Further, it allows members of the public to see the origin and nature of content removals, and make their own evaluations of them.



    Where do the notices come from?



    Notice submissions to the Lumen database generally come from two types of sources.


    Individual users can submit requests to remove material they have sent or received, including cease and desist letters and other takedown notices they have received or sent, including, but by no means limited to, DMCA notices


    Businesses that receive notices (like Google, Twitter, Wordpress, Reddit and others) have partnered with Lumen to automatically send us all of the removal requests they receive. For more information about these bulk notice submitters' submissions to Lumen, please refer to each company's own website and information pages.




    A notice contains "[redacted]" – what is missing?



    Lumen staff make a good faith effort to review and redact any potentially sensitive notices that the project receives in order to remove sensitive or personal information from the text of notices. Such information might include phone numbers, email addresses, or allegedly defamatory content. Further, an individual or company submitting a notice directly to the Lumen database may have decided not to share with Lumen, or to keep private, certain pieces of information in the notice.


    Please note that for DMCA notices, Lumen does not redact the name of the rightsholder making the request or the URL(s) of the material complained of. Without the location of the complained-of material and the complainant, the notices are meaningless from a public transparency or research perspective, to say nothing of offering no insight as to possible misuse of takedown notices as a vehicle for censorship.



    Who is Lumen for?



    Lumen is designed for use by lay Internet users curious about a notice they may have encountered, as well Internet and legal researchers studying larger trends about free expression and content removal online. If you have further ideas about how to use the database, or suggestions about this FAQ, email us at team@lumendatabase.org with the subject line "A Suggestion for Researcher FAQs."



    How does it work?



    We are excited that you're interested in conducting research using our database of cease and desist notices, and pleased to be able to offer you a powerful new user interface. Most users will find that the web interface will suffice for browsing and discovery within the database. However, for those that need to access larger swaths of data for use or reuse in various applications, we offer our new API. Read on for further information.





    BASIC FACTS ABOUT THE DATABASE



    Contents



    • API Documentation

    • Formatting

    • Understanding dates - Unix Timestamps

    • Searching the database

    API Documentation



    The documentation for the Lumen API can be found here.



    Formatting



    When a query or request is submitted to the database, the system will return a response with a list of JSON-encoded attributes. Learn more about JSON (JavaScript Object/Open Notation) here. This format is designed to be “machine readable,” and not necessarily useful to a human reader in its raw form. However, there are many tools for rendering JSON output into a friendlier form, and we recommend finding one that works for you.



    Example JSON Request:


    curl -H "User-Agent: SomeUserAgentHere" http://lumendatabase.org/notices/1.json


    Example Successful JSON Output:




    "dmca":
    "id":1,
    "title":"Lion King on YouTube",
    "body":null,
    "date_sent":"2013-06-04T19:23:12Z",
    "date_received":"2013-06-05T20:31:44Z",
    "topics":[
    "Anticircumvention (DMCA)",
    "Bookmarks",
    "Lumen"
    ],
    "tags": [
    "tag_1",
    "tag_2"
    ],
    "jurisdictions": [
    "US",
    "CA"
    ],
    "action_taken": "Partial",
    "sender_name": "Joe Lawyer",
    "recipient_name": "Google, Inc.",
    "works": [

    "description": "Lion King Video",
    "copyrighted_urls": [
    "url": "http://www.example.com/lion_king.mp4" ,
    "url": "http://www.example.com/lion_king.mov"
    ],
    "infringing_urls": [
    "url": "http://www.example.com/infringing1" ,
    "url": "http://www.example.com/infringing2" ,
    "url": "http://www.example.com/infringing3"
    ]

    ]





    Understanding dates - Unix Timestamps



    The Lumen database accepts dates in a variety of formats but always outputs dates in Unix Time, which is the number of seconds elapsed since the beginning of the Unix epoch. This can be quite confusing at first, and we recommend using a Unix Timestamp conversion tool (like this one here) to transform these raw date outputs into something a human can understand.



    Searching the Database



    Most users will find that the web interface will suffice for browsing and discovery within the database. However, for those that need to access larger swaths of data or create automated processes to digest data trends, we offer our new API.



    Searching the database, whether through the web interface or with the API, is done via full-text search. The default search is to search all possible notice fields and facets. Searches can also refined based on specific slices of the database or on specific facets of the data. See the documentation for the applicable notice parameters and metadata.




    QUERYING THE DATABASE WITH THE API



    Contents



    • Getting an API Key

    • Basic search from the command line

    • Requesting a list of topics

    • Searching notices

    • Rate Limits

    • API Terms of Use

    Getting an API Key



    An authentication key is needed in order to query the database at will via the API. Contact the Lumen staff at team@lumendatabase.org to be provided with one. API queries to the database submitted without a token will be capped at the first 25 results, and at 5 requests per day.

    Basic search from the command line



    To query the database, use your preferred tools for HTTP "get" requests. There are a number of options available, so pick one depending on your research needs.
    Examples include:



    • Curl - a command line program for Mac, iOS and BSD operating system computers, but not for Windows. In order to use curl commands on Windows, a separate tool such as CygWin or Putty is needed.

    • wget - dumps the results of the "get" request to a file.

    Example search query for Batman where <parameter> is the database field or facet that is the object of the search.




    curl -H "Accept: application/json" -H "Content-type: application/json" -H "User-Agent: SomeUserAgentHere" 'https://www.lumendatabase.org/notices/search?<parameter>=batman'



    Here’s a search query for star where term is the parameter.




    curl -H "Accept: application/json" -H "Content-type: application/json" -H "User-Agent: SomeUserAgentHere" 'https://www.lumendatabase.org/notices/search?term=star'



    Searches can also combine multiple parameters when linked with an ampersand. Below, the query combines a search for star where term is the parameter, where batman is the sender_name, and date_received falls between RANGE1..RANGE2





    curl -H "Accept: application/json" -H "Content-type: application/json" -H "User-Agent: SomeUserAgentHere" 'https://www.lumendatabase.org/notices/search?term=star&sender_name=batman&date_received=_facet=RANGE1..RANGE2'



    Running these search queries through the API will allow you to search for some period of time, as well as download search results for use and reuse in applications. A complete list of searchable parameters can be found here.



    Requesting a List of Topics



    The database classifies notices into one or more topics, more of which may be added over time. Certain topics are categorized as subtopics of a larger, comprehensive root topic. For example, like “DMCA,” “fair use,” and “anti-circumvention” all fall under “Copyright.” Each topic has a unique numerical ID in the database. To request a list of topics, use the following command.





    curl -H "User-Agent: SomeUserAgentHere" https://www.lumendatabase.org/topics.json



    This command will return results with three pieces of information: 1) the topic's unique ID number, 2) the name of the topic, and 3) either the ID number of the parent topic or null if the topic is a root topic.












    idintegerThe unique ID used for the topic_ids array during notice creation
    namestringThe topic name
    parent_idintegerThe parent topic_id of this topic, or "null" if this is a root topic.

    Searching the notices



    On the web interface, above a certain number of hits your search results will be paginated. By default, results are sorted by descending relevance. Full-text search results contain the same data as an individually-requested notices, with the addition of a score field that articulates the result relevance to the query term; higher numbers are more relevant. Terms are joined with an 'OR' by default.



    Downloading Results in Bulk



    In order to better manage its resources, Lumen limits requests to its API. For those interested in unlimited access to the database through the API, please see "Getting an API key."

    Rate Limits



    To ensure that all visitors have equal access to the site, we've instituted the following rate limits on the Lumen Database:



    • For json files, up to 1 request per second

    • For notice pages, up to 5 requests per second

    • For search pages, up to 20 requests per minute

    Example: Alex needs to fetch search results in JSON format. In her case, both the JSON rate limit and the Search rate limit apply. She should plan on fetching up to 20 pages per minute as this is the lower limit of the two.



    API Terms of Use


    The terms of use for the API are available by clicking here.



    Cc.logo.circle 2017 Lumen


    var _paq = _paq || ;
    /* tracker methods like "setCustomDimension" should be called before "trackPageView" */
    _paq.push(["setDomains",
    ["flutie.law.harvard.edu","*.www.chillingeffects.org","*.chillingeffects.org","*.lumendatabase.org","*.www.lumendatabase.org"]]);
    _paq.push(['trackPageView']);
    _paq.push(['enableLinkTracking']);
    (function()
    var u="//stats.berkman.harvard.edu/";
    _paq.push(['setTrackerUrl', u+'piwik.php']);
    _paq.push(['setSiteId', '3']);
    )();

    Popular posts from this blog

    大跃进

    马相伯