Article sub-classes

The following sub-classes belong to the article class.

johnny5.biography(I[, Itype]) Class for biographies of real people.
johnny5.place(I[, Itype]) Places (includes methods to get coordinates).
johnny5.band(I[, Itype]) Class for music bands.
johnny5.song(I[, Itype]) Class for songs.

biography

class johnny5.biography(I, Itype=None)[source]

Class for biographies of real people.

L()

Returns the number of language editions of the article.

Returns:

L : int

Number of Wikipedia language editions this article exists in.

alive(boolean=False)[source]

Retrieves the information whether the biography is about a living or dead person. It uses the WikiProject Biography template from the Talk page to get this information.

Returns:

alive : str

Returns either ‘yes’ or ‘no’.

content(lang=’en’)

Returns the content of the Wikipedia page in the selected language. The output is in Wikipedia markup.

Parameters:

lang : str (default=’en’)

Language

Returns:

content : str

Content for the page in the given language. Content is in WikiMarkup

creation_date(lang=None)

Gets the creation date of the different Wikipedia language editions. The Wikipedia API requires this data to be requestes one page at a time, so there is no boost in collecting pages into a list.

Parameters:

lang : str (optional)

Language to get the creation date for.

Returns:

timestamp : str or dict

Timestamp in the format ‘2002-07-26T04:32:17Z’. If lang is not provided it will return a dictionary with languages as keys and timestamps as values.

curid()

Returns the english curid of the article. Will get it if it is not provided.

curid_nonen()

Gets the curid in a non-english language. The curid is a string and has the form: ‘lang.curid’

data_wd()

Returns the metadata about the Wikidata page.

data_wp()

Returns the metadata about the Wikipedia page.

death_date(raw=False)[source]

Gets the death date from the infobox. If it is not available in the infobox (or it cannot parse it) it uses Wikidata.

Parameters:

raw : boolean (False)

If True it also returns the raw text from the infobox.

Returns:

d : tuple

(yyyy,mm,dd)

t : string (if raw)

Raw text from the infobox.

MISSING TAG: d-da (490286)

desc()[source]

One sentence description of the person.

dump(path=”, file_name=None)

Dumps the object to a file.

extract(lang=’en’)

Returns the page extract (brief description).

Parameters:

lang : str (default=’en’)

Language edition to get the infobox from.

Returns:

extract : str

Wikipedia page extract.

find_article()

Find the article by trying different combinations of the title’s capitalization.

html_soup()

Gets the html for the English Wikipedia page parsed as a BeautifulSoup object.

image_url()

Gets the url for the image that appears in the infobox. It iterates over a list of languages, ordered according to their wikipedia size, until it finds one.

Returns:

img_url : str

Ful url for the image.

infobox(lang=’en’, force=False)

Returns the infobox of the article.

Parameters:

lang : str (default=’en’)

Language edition to get the infobox from.

force : boolean (False)

If True it will ‘force’ the search for the infobox by getting the template that is the most similar to an Infobox. Recommended usage is only for non english editions.

is_bio()[source]

Classifier for biographies

Returns:

is_bio : boolean

True if page is a biography.

Returns the langlinks of the article.

Parameters:

lang : str (optional)

Language to get the link for.

Returns:

out : str or dict

If a language is provided, it will return the name of the page in that language. If no language is provided, it will resturn a dictionary with the languages as keys and the titles as values.

name()[source]
occupation(C=None, return_all=False, override_train=False)[source]

Uses the occupation classifier Occ to predict the occupation. This function will run slow when C is not passed, since it will need to load the classifier in each call. Instead use:

>>> C = johnny5.Occ()
>>> article.occupation(C=C)
Parameters:

C : johnny5.Occ (optional)

Occupation classifier included in johnny5. If not provided, this function will be slow.

return_all : Boolean (False)

If True it will return the probabilities for all occupations in as list of 2-tuples.

override_train : boolean (False)

If True it will run the classifier even if the given biography belongs to the training set.

Returns:

label : str

Most likely occupation

prob_ratio : float

Ratio between the most likely occupation, and the second most likely occupation. If the biography belongs to the training set, it will return prob_ratio=0.

pageviews(start_date, end_date=None, lang=’en’, cdate_override=False, daily=False, get_previous=True)

Gets the pageviews between the provided dates for the given language editions. Unless specified, this function checks whether the english page had any other title, and gets the pageviews accordingly.

Parameters:

start_date : str

Start date in format ‘yyyy-mm’. If start_date=None is passed, it will get all the pageviews for that edition.

end_date : str

End date in format ‘yyyy-mm’. If it is not provided it will get pagviews until today.

lang : str (‘en’)

Language edition to get the pageviews for. If lang=None is passed, it will get the pageviews for all language editions.

cdate_override : boolean (False)

If True it will get the pageviews before the creation date

daily : boolean (False)

If True it will return the daily pageviews.

get_previous : boolean (True)

If True it will search for all the previous titles of the pages and get the pageviews for them as well. Only works for English.

Returns:

views : pandas.DataFrame

Table with columns year,month,(day),views.

previous_titles()

Gets all the previous titles the page had. ONLY WORKS FOR ENGLISH FOR NOW

Returns:

titles : set

Collection of previous titles

redirect()

Handles redirects if the page has one.

revisions(user=True)

Gets the timestamps for the edit history of the Wikipedia article.

Parameters:

user : boolean (True)

If True it returns the user who made the edit as well as the edit timestamp.

section(section_title)

Returns the content inside the given section of the English Wikipedia page.

Parameters:

section_title : str

Title of the section.

Returns:

content : str

Content of the section in WikiMarkup

tables(i=None)

Gets tables in the page.

Parameters:

i : int (optional)

Position of the table to get. If not provided it will return a list of tables

Returns:

tables : list or pandas.DataFrame

The parsed tables found in the page.

title()

Returns the title of the article. Will get it if it is not provided.

url(wiki=’wp’, lang=’en’)
wd_prop(prop)

Gets the requested Wikidata propery.

Parameters:

prop : str

Wikidata code for the property.

Returns:

props : list

List of values for the given property.

Examples

To get the date of birth of Albert Einstein run: >>> b = johnny5.article(‘Q937’) >>> b.wd_prop(‘P569’)

wdid()

Returns the wdid of the article. Will get it if it is not provided.

Gets all the Wikipedia pages linked from the article. It only returns Wikipedia pages.

Returns:

titles : set

Set of titles for the Wikipedia pages linked from the article.

place

class johnny5.place(I, Itype=None)[source]

Places (includes methods to get coordinates).

L()

Returns the number of language editions of the article.

Returns:

L : int

Number of Wikipedia language editions this article exists in.

content(lang=’en’)

Returns the content of the Wikipedia page in the selected language. The output is in Wikipedia markup.

Parameters:

lang : str (default=’en’)

Language

Returns:

content : str

Content for the page in the given language. Content is in WikiMarkup

coords(wiki=’wp’)[source]

Get the coordinates either from Wikipedia or Wikidata.

Parameters:

wiki : string

Wiki to use, either ‘wd’ or ‘wp’. Default is ‘wp’

country(GAPI_KEY=None, name=False)[source]

Uses google places API to get the country of the given place.

Parameters:

GAPI_KEY : str

Name of the environment variable that has the API key.

name : boolean (False)

If True it returns the name of the country.

Returns:

ccode : str

Country code.

creation_date(lang=None)

Gets the creation date of the different Wikipedia language editions. The Wikipedia API requires this data to be requestes one page at a time, so there is no boost in collecting pages into a list.

Parameters:

lang : str (optional)

Language to get the creation date for.

Returns:

timestamp : str or dict

Timestamp in the format ‘2002-07-26T04:32:17Z’. If lang is not provided it will return a dictionary with languages as keys and timestamps as values.

curid()

Returns the english curid of the article. Will get it if it is not provided.

curid_nonen()

Gets the curid in a non-english language. The curid is a string and has the form: ‘lang.curid’

data_wd()

Returns the metadata about the Wikidata page.

data_wp()

Returns the metadata about the Wikipedia page.

dump(path=”, file_name=None)

Dumps the object to a file.

extract(lang=’en’)

Returns the page extract (brief description).

Parameters:

lang : str (default=’en’)

Language edition to get the infobox from.

Returns:

extract : str

Wikipedia page extract.

find_article()

Find the article by trying different combinations of the title’s capitalization.

html_soup()

Gets the html for the English Wikipedia page parsed as a BeautifulSoup object.

image_url()

Gets the url for the image that appears in the infobox. It iterates over a list of languages, ordered according to their wikipedia size, until it finds one.

Returns:

img_url : str

Ful url for the image.

infobox(lang=’en’, force=False)

Returns the infobox of the article.

Parameters:

lang : str (default=’en’)

Language edition to get the infobox from.

force : boolean (False)

If True it will ‘force’ the search for the infobox by getting the template that is the most similar to an Infobox. Recommended usage is only for non english editions.

Returns the langlinks of the article.

Parameters:

lang : str (optional)

Language to get the link for.

Returns:

out : str or dict

If a language is provided, it will return the name of the page in that language. If no language is provided, it will resturn a dictionary with the languages as keys and the titles as values.

pageviews(start_date, end_date=None, lang=’en’, cdate_override=False, daily=False, get_previous=True)

Gets the pageviews between the provided dates for the given language editions. Unless specified, this function checks whether the english page had any other title, and gets the pageviews accordingly.

Parameters:

start_date : str

Start date in format ‘yyyy-mm’. If start_date=None is passed, it will get all the pageviews for that edition.

end_date : str

End date in format ‘yyyy-mm’. If it is not provided it will get pagviews until today.

lang : str (‘en’)

Language edition to get the pageviews for. If lang=None is passed, it will get the pageviews for all language editions.

cdate_override : boolean (False)

If True it will get the pageviews before the creation date

daily : boolean (False)

If True it will return the daily pageviews.

get_previous : boolean (True)

If True it will search for all the previous titles of the pages and get the pageviews for them as well. Only works for English.

Returns:

views : pandas.DataFrame

Table with columns year,month,(day),views.

previous_titles()

Gets all the previous titles the page had. ONLY WORKS FOR ENGLISH FOR NOW

Returns:

titles : set

Collection of previous titles

redirect()

Handles redirects if the page has one.

revisions(user=True)

Gets the timestamps for the edit history of the Wikipedia article.

Parameters:

user : boolean (True)

If True it returns the user who made the edit as well as the edit timestamp.

section(section_title)

Returns the content inside the given section of the English Wikipedia page.

Parameters:

section_title : str

Title of the section.

Returns:

content : str

Content of the section in WikiMarkup

tables(i=None)

Gets tables in the page.

Parameters:

i : int (optional)

Position of the table to get. If not provided it will return a list of tables

Returns:

tables : list or pandas.DataFrame

The parsed tables found in the page.

title()

Returns the title of the article. Will get it if it is not provided.

url(wiki=’wp’, lang=’en’)
wd_prop(prop)

Gets the requested Wikidata propery.

Parameters:

prop : str

Wikidata code for the property.

Returns:

props : list

List of values for the given property.

Examples

To get the date of birth of Albert Einstein run: >>> b = johnny5.article(‘Q937’) >>> b.wd_prop(‘P569’)

wdid()

Returns the wdid of the article. Will get it if it is not provided.

Gets all the Wikipedia pages linked from the article. It only returns Wikipedia pages.

Returns:

titles : set

Set of titles for the Wikipedia pages linked from the article.

band

class johnny5.band(I, Itype=None)[source]

Class for music bands. It links to Spotify as well. IT SHOULD ALSO LINK TO GENIUS

L()

Returns the number of language editions of the article.

Returns:

L : int

Number of Wikipedia language editions this article exists in.

btypes()[source]

Categories this band is an instance of.

content(lang=’en’)

Returns the content of the Wikipedia page in the selected language. The output is in Wikipedia markup.

Parameters:

lang : str (default=’en’)

Language

Returns:

content : str

Content for the page in the given language. Content is in WikiMarkup

creation_date(lang=None)

Gets the creation date of the different Wikipedia language editions. The Wikipedia API requires this data to be requestes one page at a time, so there is no boost in collecting pages into a list.

Parameters:

lang : str (optional)

Language to get the creation date for.

Returns:

timestamp : str or dict

Timestamp in the format ‘2002-07-26T04:32:17Z’. If lang is not provided it will return a dictionary with languages as keys and timestamps as values.

curid()

Returns the english curid of the article. Will get it if it is not provided.

curid_nonen()

Gets the curid in a non-english language. The curid is a string and has the form: ‘lang.curid’

data_wd()

Returns the metadata about the Wikidata page.

data_wp()

Returns the metadata about the Wikipedia page.

dump(path=”, file_name=None)

Dumps the object to a file.

extract(lang=’en’)

Returns the page extract (brief description).

Parameters:

lang : str (default=’en’)

Language edition to get the infobox from.

Returns:

extract : str

Wikipedia page extract.

find_article()

Find the article by trying different combinations of the title’s capitalization.

formation_place()[source]

Gets the formation place for the band. Uses Wikidata and Wikipedia

Returns:

place_name : str

Name of the formation place. Typically the title of the Wikipedia page corresponding to the place.

country_code : str

3-digit code of the country where the band was formed

lat,lon : (float,float)

Coordinates of the formation place.

genres()[source]

Genres according to Wikidata

Returns:

genres : list

List of genre names.

html_soup()

Gets the html for the English Wikipedia page parsed as a BeautifulSoup object.

image_url()

Gets the url for the image that appears in the infobox. It iterates over a list of languages, ordered according to their wikipedia size, until it finds one.

Returns:

img_url : str

Ful url for the image.

inception()[source]

Band’s creation year

Returns:

year : int

Formation year

infobox(lang=’en’, force=False)

Returns the infobox of the article.

Parameters:

lang : str (default=’en’)

Language edition to get the infobox from.

force : boolean (False)

If True it will ‘force’ the search for the infobox by getting the template that is the most similar to an Infobox. Recommended usage is only for non english editions.

Returns the langlinks of the article.

Parameters:

lang : str (optional)

Language to get the link for.

Returns:

out : str or dict

If a language is provided, it will return the name of the page in that language. If no language is provided, it will resturn a dictionary with the languages as keys and the titles as values.

pageviews(start_date, end_date=None, lang=’en’, cdate_override=False, daily=False, get_previous=True)

Gets the pageviews between the provided dates for the given language editions. Unless specified, this function checks whether the english page had any other title, and gets the pageviews accordingly.

Parameters:

start_date : str

Start date in format ‘yyyy-mm’. If start_date=None is passed, it will get all the pageviews for that edition.

end_date : str

End date in format ‘yyyy-mm’. If it is not provided it will get pagviews until today.

lang : str (‘en’)

Language edition to get the pageviews for. If lang=None is passed, it will get the pageviews for all language editions.

cdate_override : boolean (False)

If True it will get the pageviews before the creation date

daily : boolean (False)

If True it will return the daily pageviews.

get_previous : boolean (True)

If True it will search for all the previous titles of the pages and get the pageviews for them as well. Only works for English.

Returns:

views : pandas.DataFrame

Table with columns year,month,(day),views.

previous_titles()

Gets all the previous titles the page had. ONLY WORKS FOR ENGLISH FOR NOW

Returns:

titles : set

Collection of previous titles

redirect()

Handles redirects if the page has one.

revisions(user=True)

Gets the timestamps for the edit history of the Wikipedia article.

Parameters:

user : boolean (True)

If True it returns the user who made the edit as well as the edit timestamp.

section(section_title)

Returns the content inside the given section of the English Wikipedia page.

Parameters:

section_title : str

Title of the section.

Returns:

content : str

Content of the section in WikiMarkup

spotify_id()[source]

Uses Wikidata to get the spotify_id of the band.

Returns:

spotify_id : str

Spotify ID.

spotify_pop()[source]

Average popularity of the top 10 songs of the band

Returns:mean(pop),max(pop),len(pop)
tables(i=None)

Gets tables in the page.

Parameters:

i : int (optional)

Position of the table to get. If not provided it will return a list of tables

Returns:

tables : list or pandas.DataFrame

The parsed tables found in the page.

title()

Returns the title of the article. Will get it if it is not provided.

url(wiki=’wp’, lang=’en’)
wd_prop(prop)

Gets the requested Wikidata propery.

Parameters:

prop : str

Wikidata code for the property.

Returns:

props : list

List of values for the given property.

Examples

To get the date of birth of Albert Einstein run: >>> b = johnny5.article(‘Q937’) >>> b.wd_prop(‘P569’)

wdid()

Returns the wdid of the article. Will get it if it is not provided.

Gets all the Wikipedia pages linked from the article. It only returns Wikipedia pages.

Returns:

titles : set

Set of titles for the Wikipedia pages linked from the article.

song

class johnny5.song(I, Itype=None)[source]

Class for songs.

L()

Returns the number of language editions of the article.

Returns:

L : int

Number of Wikipedia language editions this article exists in.

content(lang=’en’)

Returns the content of the Wikipedia page in the selected language. The output is in Wikipedia markup.

Parameters:

lang : str (default=’en’)

Language

Returns:

content : str

Content for the page in the given language. Content is in WikiMarkup

creation_date(lang=None)

Gets the creation date of the different Wikipedia language editions. The Wikipedia API requires this data to be requestes one page at a time, so there is no boost in collecting pages into a list.

Parameters:

lang : str (optional)

Language to get the creation date for.

Returns:

timestamp : str or dict

Timestamp in the format ‘2002-07-26T04:32:17Z’. If lang is not provided it will return a dictionary with languages as keys and timestamps as values.

curid()

Returns the english curid of the article. Will get it if it is not provided.

curid_nonen()

Gets the curid in a non-english language. The curid is a string and has the form: ‘lang.curid’

data_wd()

Returns the metadata about the Wikidata page.

data_wp()

Returns the metadata about the Wikipedia page.

disambiguate(artist=None)[source]

If the provided page is a disambiguation page, it returns the song that it was able to find within the links.

Parameters:

artist : str (optional)

If provided it will get the song associated with the given artist.

dump(path=”, file_name=None)

Dumps the object to a file.

extract(lang=’en’)

Returns the page extract (brief description).

Parameters:

lang : str (default=’en’)

Language edition to get the infobox from.

Returns:

extract : str

Wikipedia page extract.

find_article()[source]

Find the article by trying different combinations of the title.

html_soup()

Gets the html for the English Wikipedia page parsed as a BeautifulSoup object.

image_url()

Gets the url for the image that appears in the infobox. It iterates over a list of languages, ordered according to their wikipedia size, until it finds one.

Returns:

img_url : str

Ful url for the image.

infobox(lang=’en’, force=False)

Returns the infobox of the article.

Parameters:

lang : str (default=’en’)

Language edition to get the infobox from.

force : boolean (False)

If True it will ‘force’ the search for the infobox by getting the template that is the most similar to an Infobox. Recommended usage is only for non english editions.

is_song()[source]

Returns the langlinks of the article.

Parameters:

lang : str (optional)

Language to get the link for.

Returns:

out : str or dict

If a language is provided, it will return the name of the page in that language. If no language is provided, it will resturn a dictionary with the languages as keys and the titles as values.

pageviews(start_date, end_date=None, lang=’en’, cdate_override=False, daily=False, get_previous=True)

Gets the pageviews between the provided dates for the given language editions. Unless specified, this function checks whether the english page had any other title, and gets the pageviews accordingly.

Parameters:

start_date : str

Start date in format ‘yyyy-mm’. If start_date=None is passed, it will get all the pageviews for that edition.

end_date : str

End date in format ‘yyyy-mm’. If it is not provided it will get pagviews until today.

lang : str (‘en’)

Language edition to get the pageviews for. If lang=None is passed, it will get the pageviews for all language editions.

cdate_override : boolean (False)

If True it will get the pageviews before the creation date

daily : boolean (False)

If True it will return the daily pageviews.

get_previous : boolean (True)

If True it will search for all the previous titles of the pages and get the pageviews for them as well. Only works for English.

Returns:

views : pandas.DataFrame

Table with columns year,month,(day),views.

performer()[source]
previous_titles()

Gets all the previous titles the page had. ONLY WORKS FOR ENGLISH FOR NOW

Returns:

titles : set

Collection of previous titles

redirect()

Handles redirects if the page has one.

revisions(user=True)

Gets the timestamps for the edit history of the Wikipedia article.

Parameters:

user : boolean (True)

If True it returns the user who made the edit as well as the edit timestamp.

section(section_title)

Returns the content inside the given section of the English Wikipedia page.

Parameters:

section_title : str

Title of the section.

Returns:

content : str

Content of the section in WikiMarkup

tables(i=None)

Gets tables in the page.

Parameters:

i : int (optional)

Position of the table to get. If not provided it will return a list of tables

Returns:

tables : list or pandas.DataFrame

The parsed tables found in the page.

title()

Returns the title of the article. Will get it if it is not provided.

url(wiki=’wp’, lang=’en’)
wd_prop(prop)

Gets the requested Wikidata propery.

Parameters:

prop : str

Wikidata code for the property.

Returns:

props : list

List of values for the given property.

Examples

To get the date of birth of Albert Einstein run: >>> b = johnny5.article(‘Q937’) >>> b.wd_prop(‘P569’)

wdid()

Returns the wdid of the article. Will get it if it is not provided.

Gets all the Wikipedia pages linked from the article. It only returns Wikipedia pages.

Returns:

titles : set

Set of titles for the Wikipedia pages linked from the article.