Article sub-classes¶
The following sub-classes belong to the article
class.
johnny5.biography (I[, Itype]) |
Class for biographies of real people. |
johnny5.place (I[, Itype]) |
Places (includes methods to get coordinates). |
johnny5.band (I[, Itype]) |
Class for music bands. |
johnny5.song (I[, Itype]) |
Class for songs. |
biography¶
-
class
johnny5.
biography
(I, Itype=None)[source]¶ Class for biographies of real people.
-
L
()¶ Returns the number of language editions of the article.
Returns: L : int
Number of Wikipedia language editions this article exists in.
-
alive
(boolean=False)[source]¶ Retrieves the information whether the biography is about a living or dead person. It uses the WikiProject Biography template from the Talk page to get this information.
Returns: alive : str
Returns either ‘yes’ or ‘no’.
-
content
(lang=’en’)¶ Returns the content of the Wikipedia page in the selected language. The output is in Wikipedia markup.
Parameters: lang : str (default=’en’)
Language
Returns: content : str
Content for the page in the given language. Content is in WikiMarkup
-
creation_date
(lang=None)¶ Gets the creation date of the different Wikipedia language editions. The Wikipedia API requires this data to be requestes one page at a time, so there is no boost in collecting pages into a list.
Parameters: lang : str (optional)
Language to get the creation date for.
Returns: timestamp : str or dict
Timestamp in the format ‘2002-07-26T04:32:17Z’. If lang is not provided it will return a dictionary with languages as keys and timestamps as values.
-
curid
()¶ Returns the english curid of the article. Will get it if it is not provided.
-
curid_nonen
()¶ Gets the curid in a non-english language. The curid is a string and has the form: ‘lang.curid’
-
data_wd
()¶ Returns the metadata about the Wikidata page.
-
data_wp
()¶ Returns the metadata about the Wikipedia page.
-
death_date
(raw=False)[source]¶ Gets the death date from the infobox. If it is not available in the infobox (or it cannot parse it) it uses Wikidata.
Parameters: raw : boolean (False)
If True it also returns the raw text from the infobox.
Returns: d : tuple
(yyyy,mm,dd)
t : string (if raw)
Raw text from the infobox.
MISSING TAG: d-da (490286)
-
dump
(path=”, file_name=None)¶ Dumps the object to a file.
-
extract
(lang=’en’)¶ Returns the page extract (brief description).
Parameters: lang : str (default=’en’)
Language edition to get the infobox from.
Returns: extract : str
Wikipedia page extract.
-
find_article
()¶ Find the article by trying different combinations of the title’s capitalization.
-
html_soup
()¶ Gets the html for the English Wikipedia page parsed as a BeautifulSoup object.
-
image_url
()¶ Gets the url for the image that appears in the infobox. It iterates over a list of languages, ordered according to their wikipedia size, until it finds one.
Returns: img_url : str
Ful url for the image.
-
infobox
(lang=’en’, force=False)¶ Returns the infobox of the article.
Parameters: lang : str (default=’en’)
Language edition to get the infobox from.
force : boolean (False)
If True it will ‘force’ the search for the infobox by getting the template that is the most similar to an Infobox. Recommended usage is only for non english editions.
-
langlinks
(lang=None)¶ Returns the langlinks of the article.
Parameters: lang : str (optional)
Language to get the link for.
Returns: out : str or dict
If a language is provided, it will return the name of the page in that language. If no language is provided, it will resturn a dictionary with the languages as keys and the titles as values.
-
occupation
(C=None, return_all=False, override_train=False)[source]¶ Uses the occupation classifier Occ to predict the occupation. This function will run slow when C is not passed, since it will need to load the classifier in each call. Instead use:
>>> C = johnny5.Occ() >>> article.occupation(C=C)
Parameters: C : johnny5.Occ (optional)
Occupation classifier included in johnny5. If not provided, this function will be slow.
return_all : Boolean (False)
If True it will return the probabilities for all occupations in as list of 2-tuples.
override_train : boolean (False)
If True it will run the classifier even if the given biography belongs to the training set.
Returns: label : str
Most likely occupation
prob_ratio : float
Ratio between the most likely occupation, and the second most likely occupation. If the biography belongs to the training set, it will return prob_ratio=0.
-
pageviews
(start_date, end_date=None, lang=’en’, cdate_override=False, daily=False, get_previous=True)¶ Gets the pageviews between the provided dates for the given language editions. Unless specified, this function checks whether the english page had any other title, and gets the pageviews accordingly.
Parameters: start_date : str
Start date in format ‘yyyy-mm’. If start_date=None is passed, it will get all the pageviews for that edition.
end_date : str
End date in format ‘yyyy-mm’. If it is not provided it will get pagviews until today.
lang : str (‘en’)
Language edition to get the pageviews for. If lang=None is passed, it will get the pageviews for all language editions.
cdate_override : boolean (False)
If True it will get the pageviews before the creation date
daily : boolean (False)
If True it will return the daily pageviews.
get_previous : boolean (True)
If True it will search for all the previous titles of the pages and get the pageviews for them as well. Only works for English.
Returns: views : pandas.DataFrame
Table with columns year,month,(day),views.
-
previous_titles
()¶ Gets all the previous titles the page had. ONLY WORKS FOR ENGLISH FOR NOW
Returns: titles : set
Collection of previous titles
-
redirect
()¶ Handles redirects if the page has one.
-
revisions
(user=True)¶ Gets the timestamps for the edit history of the Wikipedia article.
Parameters: user : boolean (True)
If True it returns the user who made the edit as well as the edit timestamp.
-
section
(section_title)¶ Returns the content inside the given section of the English Wikipedia page.
Parameters: section_title : str
Title of the section.
Returns: content : str
Content of the section in WikiMarkup
-
tables
(i=None)¶ Gets tables in the page.
Parameters: i : int (optional)
Position of the table to get. If not provided it will return a list of tables
Returns: tables : list or pandas.DataFrame
The parsed tables found in the page.
-
title
()¶ Returns the title of the article. Will get it if it is not provided.
-
url
(wiki=’wp’, lang=’en’)¶
-
wd_prop
(prop)¶ Gets the requested Wikidata propery.
Parameters: prop : str
Wikidata code for the property.
Returns: props : list
List of values for the given property.
Examples
To get the date of birth of Albert Einstein run: >>> b = johnny5.article(‘Q937’) >>> b.wd_prop(‘P569’)
-
wdid
()¶ Returns the wdid of the article. Will get it if it is not provided.
-
wiki_links
(section_title=None)¶ Gets all the Wikipedia pages linked from the article. It only returns Wikipedia pages.
Returns: titles : set
Set of titles for the Wikipedia pages linked from the article.
-
place¶
-
class
johnny5.
place
(I, Itype=None)[source]¶ Places (includes methods to get coordinates).
-
L
()¶ Returns the number of language editions of the article.
Returns: L : int
Number of Wikipedia language editions this article exists in.
-
content
(lang=’en’)¶ Returns the content of the Wikipedia page in the selected language. The output is in Wikipedia markup.
Parameters: lang : str (default=’en’)
Language
Returns: content : str
Content for the page in the given language. Content is in WikiMarkup
-
coords
(wiki=’wp’)[source]¶ Get the coordinates either from Wikipedia or Wikidata.
Parameters: wiki : string
Wiki to use, either ‘wd’ or ‘wp’. Default is ‘wp’
-
country
(GAPI_KEY=None, name=False)[source]¶ Uses google places API to get the country of the given place.
Parameters: GAPI_KEY : str
Name of the environment variable that has the API key.
name : boolean (False)
If True it returns the name of the country.
Returns: ccode : str
Country code.
-
creation_date
(lang=None)¶ Gets the creation date of the different Wikipedia language editions. The Wikipedia API requires this data to be requestes one page at a time, so there is no boost in collecting pages into a list.
Parameters: lang : str (optional)
Language to get the creation date for.
Returns: timestamp : str or dict
Timestamp in the format ‘2002-07-26T04:32:17Z’. If lang is not provided it will return a dictionary with languages as keys and timestamps as values.
-
curid
()¶ Returns the english curid of the article. Will get it if it is not provided.
-
curid_nonen
()¶ Gets the curid in a non-english language. The curid is a string and has the form: ‘lang.curid’
-
data_wd
()¶ Returns the metadata about the Wikidata page.
-
data_wp
()¶ Returns the metadata about the Wikipedia page.
-
dump
(path=”, file_name=None)¶ Dumps the object to a file.
-
extract
(lang=’en’)¶ Returns the page extract (brief description).
Parameters: lang : str (default=’en’)
Language edition to get the infobox from.
Returns: extract : str
Wikipedia page extract.
-
find_article
()¶ Find the article by trying different combinations of the title’s capitalization.
-
html_soup
()¶ Gets the html for the English Wikipedia page parsed as a BeautifulSoup object.
-
image_url
()¶ Gets the url for the image that appears in the infobox. It iterates over a list of languages, ordered according to their wikipedia size, until it finds one.
Returns: img_url : str
Ful url for the image.
-
infobox
(lang=’en’, force=False)¶ Returns the infobox of the article.
Parameters: lang : str (default=’en’)
Language edition to get the infobox from.
force : boolean (False)
If True it will ‘force’ the search for the infobox by getting the template that is the most similar to an Infobox. Recommended usage is only for non english editions.
-
langlinks
(lang=None)¶ Returns the langlinks of the article.
Parameters: lang : str (optional)
Language to get the link for.
Returns: out : str or dict
If a language is provided, it will return the name of the page in that language. If no language is provided, it will resturn a dictionary with the languages as keys and the titles as values.
-
pageviews
(start_date, end_date=None, lang=’en’, cdate_override=False, daily=False, get_previous=True)¶ Gets the pageviews between the provided dates for the given language editions. Unless specified, this function checks whether the english page had any other title, and gets the pageviews accordingly.
Parameters: start_date : str
Start date in format ‘yyyy-mm’. If start_date=None is passed, it will get all the pageviews for that edition.
end_date : str
End date in format ‘yyyy-mm’. If it is not provided it will get pagviews until today.
lang : str (‘en’)
Language edition to get the pageviews for. If lang=None is passed, it will get the pageviews for all language editions.
cdate_override : boolean (False)
If True it will get the pageviews before the creation date
daily : boolean (False)
If True it will return the daily pageviews.
get_previous : boolean (True)
If True it will search for all the previous titles of the pages and get the pageviews for them as well. Only works for English.
Returns: views : pandas.DataFrame
Table with columns year,month,(day),views.
-
previous_titles
()¶ Gets all the previous titles the page had. ONLY WORKS FOR ENGLISH FOR NOW
Returns: titles : set
Collection of previous titles
-
redirect
()¶ Handles redirects if the page has one.
-
revisions
(user=True)¶ Gets the timestamps for the edit history of the Wikipedia article.
Parameters: user : boolean (True)
If True it returns the user who made the edit as well as the edit timestamp.
-
section
(section_title)¶ Returns the content inside the given section of the English Wikipedia page.
Parameters: section_title : str
Title of the section.
Returns: content : str
Content of the section in WikiMarkup
-
tables
(i=None)¶ Gets tables in the page.
Parameters: i : int (optional)
Position of the table to get. If not provided it will return a list of tables
Returns: tables : list or pandas.DataFrame
The parsed tables found in the page.
-
title
()¶ Returns the title of the article. Will get it if it is not provided.
-
url
(wiki=’wp’, lang=’en’)¶
-
wd_prop
(prop)¶ Gets the requested Wikidata propery.
Parameters: prop : str
Wikidata code for the property.
Returns: props : list
List of values for the given property.
Examples
To get the date of birth of Albert Einstein run: >>> b = johnny5.article(‘Q937’) >>> b.wd_prop(‘P569’)
-
wdid
()¶ Returns the wdid of the article. Will get it if it is not provided.
-
wiki_links
(section_title=None)¶ Gets all the Wikipedia pages linked from the article. It only returns Wikipedia pages.
Returns: titles : set
Set of titles for the Wikipedia pages linked from the article.
-
band¶
-
class
johnny5.
band
(I, Itype=None)[source]¶ Class for music bands. It links to Spotify as well. IT SHOULD ALSO LINK TO GENIUS
-
L
()¶ Returns the number of language editions of the article.
Returns: L : int
Number of Wikipedia language editions this article exists in.
-
content
(lang=’en’)¶ Returns the content of the Wikipedia page in the selected language. The output is in Wikipedia markup.
Parameters: lang : str (default=’en’)
Language
Returns: content : str
Content for the page in the given language. Content is in WikiMarkup
-
creation_date
(lang=None)¶ Gets the creation date of the different Wikipedia language editions. The Wikipedia API requires this data to be requestes one page at a time, so there is no boost in collecting pages into a list.
Parameters: lang : str (optional)
Language to get the creation date for.
Returns: timestamp : str or dict
Timestamp in the format ‘2002-07-26T04:32:17Z’. If lang is not provided it will return a dictionary with languages as keys and timestamps as values.
-
curid
()¶ Returns the english curid of the article. Will get it if it is not provided.
-
curid_nonen
()¶ Gets the curid in a non-english language. The curid is a string and has the form: ‘lang.curid’
-
data_wd
()¶ Returns the metadata about the Wikidata page.
-
data_wp
()¶ Returns the metadata about the Wikipedia page.
-
dump
(path=”, file_name=None)¶ Dumps the object to a file.
-
extract
(lang=’en’)¶ Returns the page extract (brief description).
Parameters: lang : str (default=’en’)
Language edition to get the infobox from.
Returns: extract : str
Wikipedia page extract.
-
find_article
()¶ Find the article by trying different combinations of the title’s capitalization.
-
formation_place
()[source]¶ Gets the formation place for the band. Uses Wikidata and Wikipedia
Returns: place_name : str
Name of the formation place. Typically the title of the Wikipedia page corresponding to the place.
country_code : str
3-digit code of the country where the band was formed
lat,lon : (float,float)
Coordinates of the formation place.
-
html_soup
()¶ Gets the html for the English Wikipedia page parsed as a BeautifulSoup object.
-
image_url
()¶ Gets the url for the image that appears in the infobox. It iterates over a list of languages, ordered according to their wikipedia size, until it finds one.
Returns: img_url : str
Ful url for the image.
-
infobox
(lang=’en’, force=False)¶ Returns the infobox of the article.
Parameters: lang : str (default=’en’)
Language edition to get the infobox from.
force : boolean (False)
If True it will ‘force’ the search for the infobox by getting the template that is the most similar to an Infobox. Recommended usage is only for non english editions.
-
langlinks
(lang=None)¶ Returns the langlinks of the article.
Parameters: lang : str (optional)
Language to get the link for.
Returns: out : str or dict
If a language is provided, it will return the name of the page in that language. If no language is provided, it will resturn a dictionary with the languages as keys and the titles as values.
-
pageviews
(start_date, end_date=None, lang=’en’, cdate_override=False, daily=False, get_previous=True)¶ Gets the pageviews between the provided dates for the given language editions. Unless specified, this function checks whether the english page had any other title, and gets the pageviews accordingly.
Parameters: start_date : str
Start date in format ‘yyyy-mm’. If start_date=None is passed, it will get all the pageviews for that edition.
end_date : str
End date in format ‘yyyy-mm’. If it is not provided it will get pagviews until today.
lang : str (‘en’)
Language edition to get the pageviews for. If lang=None is passed, it will get the pageviews for all language editions.
cdate_override : boolean (False)
If True it will get the pageviews before the creation date
daily : boolean (False)
If True it will return the daily pageviews.
get_previous : boolean (True)
If True it will search for all the previous titles of the pages and get the pageviews for them as well. Only works for English.
Returns: views : pandas.DataFrame
Table with columns year,month,(day),views.
-
previous_titles
()¶ Gets all the previous titles the page had. ONLY WORKS FOR ENGLISH FOR NOW
Returns: titles : set
Collection of previous titles
-
redirect
()¶ Handles redirects if the page has one.
-
revisions
(user=True)¶ Gets the timestamps for the edit history of the Wikipedia article.
Parameters: user : boolean (True)
If True it returns the user who made the edit as well as the edit timestamp.
-
section
(section_title)¶ Returns the content inside the given section of the English Wikipedia page.
Parameters: section_title : str
Title of the section.
Returns: content : str
Content of the section in WikiMarkup
-
spotify_id
()[source]¶ Uses Wikidata to get the spotify_id of the band.
Returns: spotify_id : str
Spotify ID.
-
spotify_pop
()[source]¶ Average popularity of the top 10 songs of the band
Returns: mean(pop),max(pop),len(pop)
-
tables
(i=None)¶ Gets tables in the page.
Parameters: i : int (optional)
Position of the table to get. If not provided it will return a list of tables
Returns: tables : list or pandas.DataFrame
The parsed tables found in the page.
-
title
()¶ Returns the title of the article. Will get it if it is not provided.
-
url
(wiki=’wp’, lang=’en’)¶
-
wd_prop
(prop)¶ Gets the requested Wikidata propery.
Parameters: prop : str
Wikidata code for the property.
Returns: props : list
List of values for the given property.
Examples
To get the date of birth of Albert Einstein run: >>> b = johnny5.article(‘Q937’) >>> b.wd_prop(‘P569’)
-
wdid
()¶ Returns the wdid of the article. Will get it if it is not provided.
-
wiki_links
(section_title=None)¶ Gets all the Wikipedia pages linked from the article. It only returns Wikipedia pages.
Returns: titles : set
Set of titles for the Wikipedia pages linked from the article.
-
song¶
-
class
johnny5.
song
(I, Itype=None)[source]¶ Class for songs.
-
L
()¶ Returns the number of language editions of the article.
Returns: L : int
Number of Wikipedia language editions this article exists in.
-
content
(lang=’en’)¶ Returns the content of the Wikipedia page in the selected language. The output is in Wikipedia markup.
Parameters: lang : str (default=’en’)
Language
Returns: content : str
Content for the page in the given language. Content is in WikiMarkup
-
creation_date
(lang=None)¶ Gets the creation date of the different Wikipedia language editions. The Wikipedia API requires this data to be requestes one page at a time, so there is no boost in collecting pages into a list.
Parameters: lang : str (optional)
Language to get the creation date for.
Returns: timestamp : str or dict
Timestamp in the format ‘2002-07-26T04:32:17Z’. If lang is not provided it will return a dictionary with languages as keys and timestamps as values.
-
curid
()¶ Returns the english curid of the article. Will get it if it is not provided.
-
curid_nonen
()¶ Gets the curid in a non-english language. The curid is a string and has the form: ‘lang.curid’
-
data_wd
()¶ Returns the metadata about the Wikidata page.
-
data_wp
()¶ Returns the metadata about the Wikipedia page.
-
disambiguate
(artist=None)[source]¶ If the provided page is a disambiguation page, it returns the song that it was able to find within the links.
Parameters: artist : str (optional)
If provided it will get the song associated with the given artist.
-
dump
(path=”, file_name=None)¶ Dumps the object to a file.
-
extract
(lang=’en’)¶ Returns the page extract (brief description).
Parameters: lang : str (default=’en’)
Language edition to get the infobox from.
Returns: extract : str
Wikipedia page extract.
-
html_soup
()¶ Gets the html for the English Wikipedia page parsed as a BeautifulSoup object.
-
image_url
()¶ Gets the url for the image that appears in the infobox. It iterates over a list of languages, ordered according to their wikipedia size, until it finds one.
Returns: img_url : str
Ful url for the image.
-
infobox
(lang=’en’, force=False)¶ Returns the infobox of the article.
Parameters: lang : str (default=’en’)
Language edition to get the infobox from.
force : boolean (False)
If True it will ‘force’ the search for the infobox by getting the template that is the most similar to an Infobox. Recommended usage is only for non english editions.
-
langlinks
(lang=None)¶ Returns the langlinks of the article.
Parameters: lang : str (optional)
Language to get the link for.
Returns: out : str or dict
If a language is provided, it will return the name of the page in that language. If no language is provided, it will resturn a dictionary with the languages as keys and the titles as values.
-
pageviews
(start_date, end_date=None, lang=’en’, cdate_override=False, daily=False, get_previous=True)¶ Gets the pageviews between the provided dates for the given language editions. Unless specified, this function checks whether the english page had any other title, and gets the pageviews accordingly.
Parameters: start_date : str
Start date in format ‘yyyy-mm’. If start_date=None is passed, it will get all the pageviews for that edition.
end_date : str
End date in format ‘yyyy-mm’. If it is not provided it will get pagviews until today.
lang : str (‘en’)
Language edition to get the pageviews for. If lang=None is passed, it will get the pageviews for all language editions.
cdate_override : boolean (False)
If True it will get the pageviews before the creation date
daily : boolean (False)
If True it will return the daily pageviews.
get_previous : boolean (True)
If True it will search for all the previous titles of the pages and get the pageviews for them as well. Only works for English.
Returns: views : pandas.DataFrame
Table with columns year,month,(day),views.
-
previous_titles
()¶ Gets all the previous titles the page had. ONLY WORKS FOR ENGLISH FOR NOW
Returns: titles : set
Collection of previous titles
-
redirect
()¶ Handles redirects if the page has one.
-
revisions
(user=True)¶ Gets the timestamps for the edit history of the Wikipedia article.
Parameters: user : boolean (True)
If True it returns the user who made the edit as well as the edit timestamp.
-
section
(section_title)¶ Returns the content inside the given section of the English Wikipedia page.
Parameters: section_title : str
Title of the section.
Returns: content : str
Content of the section in WikiMarkup
-
tables
(i=None)¶ Gets tables in the page.
Parameters: i : int (optional)
Position of the table to get. If not provided it will return a list of tables
Returns: tables : list or pandas.DataFrame
The parsed tables found in the page.
-
title
()¶ Returns the title of the article. Will get it if it is not provided.
-
url
(wiki=’wp’, lang=’en’)¶
-
wd_prop
(prop)¶ Gets the requested Wikidata propery.
Parameters: prop : str
Wikidata code for the property.
Returns: props : list
List of values for the given property.
Examples
To get the date of birth of Albert Einstein run: >>> b = johnny5.article(‘Q937’) >>> b.wd_prop(‘P569’)
-
wdid
()¶ Returns the wdid of the article. Will get it if it is not provided.
-
wiki_links
(section_title=None)¶ Gets all the Wikipedia pages linked from the article. It only returns Wikipedia pages.
Returns: titles : set
Set of titles for the Wikipedia pages linked from the article.
-