56tvmao: How-to instructions you can trust. Internet How to Use a Data-Scraping Tool to Extract Data from Webpages

How to Use a Data-Scraping Tool to Extract Data from Webpages

If you’re copying and pasting things off webpages and manually putting them in spreadsheets, you either don’t know what data scraping (or web scraping) is, or you do know what it is but aren’t really keen on the idea of learning how to code just to save yourself a few hours of clicking.

Either way, there are a lot of no-code data-scraping tools that can help you out, and Data Miner’s Chrome extension is one of the more intuitive options. If you’re lucky, the task you’re trying to do will already be included in the tool’s recipe book, and you won’t even have to go through the point-and-click steps involved in building your own.

How does Data Miner work?

Data Miner helps you get data off of webpages and into nicely-formatted Excel/CSV files by looking through the text of the pages you’ve loaded. That means you’ll need to be at least comfortable enough with HTML to recognize a few patterns, but nothing too extensive. Advanced HTML and/or JavaScript skills will certainly help with some tasks but aren’t necessary for most things. You should also have at least basic spreadsheet skills so you can be sure your output is clean and organized.

Also read: How to Download Entire Websites for Offline Use

1. Set up Data Miner

Using Chrome or another Chromium browser, install the extension. The extension’s pickaxe icon will appear in your toolbar, and clicking it will take you to a page where you can set up an account. The free version gives you 500 scrapes a month, which is probably enough for you unless this is something you do every day.

2. Load the data

First, navigate to the page you want to extract data from. If you have multiple pages of data or some of it is hidden behind buttons, that’s okay – there are ways to deal with that. For now, you’ll just need a representative sample so the program knows what to look for.

3. Check for a recipe

Next, open Data Miner and check the “Public” tab for existing recipes. If you’re on a popular site, someone else may have already created a process to get the data you’re looking for, which would save you quite a bit of time. Sites like Google, Amazon, and Twitter, for example, have lots of recipes available to help you instantly download links, prices, text, and other data. You can test the recipes by clicking the “Run” button to see a preview of the spreadsheet Data Miner generates. You can also tweak existing recipes to fit your needs by hitting the “Edit” button.

4. Page type

Okay, so no premade recipes worked for you. That’s okay, you can make your own. Just click the “New Recipe” button to start.

Your first choice will be “List Page” or “Detail Page.”

Select “List Page” if you’re trying to get multiple rows of data off a single page. For example, you might want to download the link and page title of every search result or get the date and content of posts in a feed. This is probably the most common type and the one we’ll use here as a demo. (The steps for a detail page are essentially the same.)

Select “Detail Page” if you have a lot of different information about one thing on a single page a product page, for example, where you need to grab its price, description, link, and rating and put it all in a single row.

Step 5: Make your rows

Hit the “Find” button and move your mouse until the yellow selection box covers all the data that you would need for a single entry into your final spreadsheet. For example, if you’re downloading search results you would need to highlight a big enough area to include the title, URL, and description, each of which you can put in separate columns in the next step. To make your selection, hit the Shift key. Don’t worry if you accidentally click; Data Miner saves all your recipe progress even if you navigate away from the page.

You’ll then want to check at least one of the boxes in the “Element’s Classes” or “HTML Element Type” section. Ideally, you’ll see the selection replicate to cover every element on the page that is in the same category as the one you selected.

If you find that the selector isn’t covering everything you need, try selecting just one of the elements and pressing “Select Parent.” This will make the box bigger and probably capture everything you need. If not, you may need to dig into the HTML a little bit and identify the classes and types of the elements you need. When in doubt, hit “Select Parent” until the box is as big as it can get without covering more than one list entry, as this will give you more flexibility when selecting columns.

Data Miner gives you a “View Element’s HTML” option at the bottom and also lets you type in custom selectors. If you want to say, grab all the links on a page with the class “product,” you could just type in a.product. This is where some basic HTML/CSS knowledge will really come in handy.

Once you’re back at the main row menu, you should see a “Row Count” with the number of entries your recipe will create in a spreadsheet. If it’s not catching everything, you’ll need to double-check your row selection.

6. Split your data into columns

Once you have all the data selected for your rows, it’s time to get it all looking nice by subdividing it into different column categories. Every selection you make here should be a subsection of the box you selected for your rows.

To make a column, just type in a name for it and use the Find button to select what you want to extract, just the same as you did for the rows. The most common data will probably be text, URL, or image URL. Getting URLs by hovering over text links can be slightly tricky; you may have to press “select parent” until you reach a level where the Element Type is <a>, which is the HTML tag for links.

To make sure you have the right kind of data in your column, just press the eye icon on the right side of each column’s name, next to the number that shows you how many columns have been selected. This will show you a preview of every row entry for that column. If something is off, go back and tweak the tags and types you chose to identify the rows. Don’t be afraid to open up the HTML viewer and check for patterns associated with the data you’re trying to grab.

7. Tell Data Miner how to get to the next page

If you have multiple pages of data to extract, you probably don’t want to be clicking through every one and running your recipe over and over again. To get around that, just tell Data Miner where to find the navigation button it needs to click to get to the next page. Be careful not to tell it to click something like “Page 2,” as then it’ll just go to, well, Page 2. Again, be sure that you’re selecting an <a> element, and use the Test Navigation button to make sure it’s working.

8. Tell Data Miner where to click or scroll to load data

Some pages don’t load data until you click something or scroll down. Luckily, Data Miner can do these things too! Use the “Find” tool at the top (you should be pretty good at that by now) to select the element you need to manipulate, then put the selector into the appropriate box and test it to make sure it works.

Figuring out exactly which selector will activate the element or infinite scrollbar can be tricky, but basic HTML knowledge and some trial and error will get you pretty far here. Most of the things you’ll need to manipulate here are JavaScript-based, but Data Miner only needs to know the CSS selector associated with the action to activate it, so you shouldn’t need to mess around with any code in most cases.

The next step also allows you to add in custom JS to do pretty much whatever you want, but that’s quite advanced and goes beyond what we need for basic scraping.

9. Save and run the recipe

Congratulations! Now it’s time to see if it all came together. Run the recipe on the page you’re on and check the preview to see if your rows and columns are doing what they’re supposed to. If not, you can go back and edit the recipe.

If everything’s behaving as it should, you can use the “Next Page” button to tell the scraper how many pages it should crawl and how fast it should go/ (Going too fast may cause the system to flag you as a bot.)

Once you have all the data you need, you can choose which file format you’d like to use to download it.

I’m having trouble; is there an easier way?

If the Data Miner program isn’t working out for you, there are plenty of other data-scraping tools available: ParseHub, Scraper, Octoparse, Import.io, VisualScraper, etc. Some of them may have more intuitive interfaces and more automation, but you’ll still need to know at least a bit about HTML and how the web is organized. What makes Data Miner especially nice for beginners is its crowdsourced recipe library, which could potentially help you avoid even the most minor encounter with code. That, combined with its fairly generous free monthly scrapes package, makes it a very decent tool for most needs.


Andrew Braun

Andrew Braun is a lifelong tech enthusiast with a wide range of interests, including travel, economics, math, data analysis, fitness, and more. He is an advocate of cryptocurrencies and other decentralized technologies, and hopes to see new generations of innovation continue to outdo each other.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Sign up for all newsletters.
By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time. Subscribe

Related Post