Extracts documents list from Amazon Kindle webpage and save into a txt, xml and html file. Project stored in GitHub.
How to use:
- Download (or build) KindleLibrary.jar
- Navigate to Manage your content and devices Amazon page (tested using Chrome but I trust it should work with any other web browser)
- Switch Show to Docs
- Scroll down to reach end of your list (or to see Show more button)
- Save the html (File -> Save Page As…, using Complete Webpage). Override the default filename with an easy name, e.g. 1.
- If more docs pending, press Show More button on the bottom of the page and iterate to Step 4
- When all pages iterated, open a command line and invoke the conversion:
Jareks-MBP:Downloads jhartman$ java -jar KindleLibrary.jar 1.htm Amazon book list extractor Elements found:400 Saving 1.html Saving 1.txt Saving 1.xml Done!
8.Convert all html files saved earlier
Example of output html and xml looks as below
Libraries & References
- jsoup Java HTML Parser
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.