Scrape public Facebook pages, posts, reviews and comments
Extract public information from Facebook Pages.
There are two main components to take into account if you want to run Facebook Scraper on the Apify platform:
The usage costs differ depending on depends on each specific case: list of URLs, total amount, set up memory, country, etc. When you scrape comments and reviews, the number of scraped posts decreases, as each post has a different URL and is scraped separately.
You can find full details on our residential proxy pricing here: https://apify.com/proxy?pricing=residential-ip#pricing.
Limit the maxPosts parameter with a reasonable number so that you do not run out of memory and your results are saved. The scraping is carried out in such a way that, while scrolling the page, partial content is kept in memory until scrolling finishes.
Apify provides a free plan where you can test your setup. With $5 platform usage credits and 4 GB maximum actor memory you can try the actor for free. For Residential Proxy trial, please contact us at support@apify.com or on Intercom.
Based on Apify’s pricing at the time of writing the Personal plan ($49) would allow you to scrape about:
Read our tutorial on how to use the scraper. It includes screenshots and examples of how to scrape the Apify Facebook page, along with handy tips and advice on proxy usage.
https://blog.apify.com/how-to-scrape-facebook-pages-posts-comments-photos-and-more-425ebef352d8
Example input, only startUrls and proxyConfiguration are required (check INPUT_SCHEMA.json for settings):
{
"startUrls": [
{ "url": "https://www.facebook.com/apifytech" },
{ "url": "https://www.facebook.com/biz/hotel-supply-service/?place_id=103095856397524" }
],
"language": "en-US",
"commentsMode": "RANKED_THREADED", // ["RANKED_THREADED", "RECENT_ACTIVITY", "RANKED_UNFILTERED"]
"maxPosts": 3,
"maxPostDate": "3 days", // or a static date in ISO format, like 2020-01-01
"minPostDate": "1 day", // or statis date in ISO format
"maxPostComments": 15,
"maxCommentDate": "2020-01-01",
"maxReviews": 3,
"maxReviewDate": "2020-01-01",
"scrapeAbout": true,
"scrapeReviews": true,
"scrapePosts": true,
"scrapeServices": true,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
{
"categories": ["Hotel"],
"info": [
"Residenc", // ...
"General Information\n" // ...
],
"likes": 1538,
"messenger": "https://m.me/22163", // ...
"posts": [
{
"postDate": "2020-09-10T09:33:43.000Z",
"postText": "Do Prahy opět", // ...
"postImages": [
{
"link": "https://www.facebook.com/Residen", //...
"image": "https://scontent-ort2-1.xx.fbcdn.net/v/t1.0" // ...
}
],
"postLinks": ["https://residen"], // ...
"postUrl": "https://www.facebook.com/permalink.php?story_fbid=", // ...
"postStats": {
"comments": 1,
"reactions": 32,
"reactionsBreakdown": {
"like": 26,
"love": 6
},
"shares": 1
},
"postComments": {
"count": 0,
"mode": "RANKED_UNFILTERED",
"comments": []
}
}
],
"priceRange": "$$$",
"title": "Hotel Resid", // ...
"pageUrl": "https://www.facebook.com/Residen", //...
"address": {
"city": "Prague, Czech Republic",
"lat": 50.09136,
"lng": 14.42575,
"postalCode": "11000",
"region": "Prague",
"street": "Haštalská 19"
},
"awards": [],
"email": "", //...
"impressum": [],
"instagram": "@Residen", // ...
"phone": "+420 22", //...
"products": [],
"transit": null,
"twitter": "@Residen", //...
"website": "http://", //...
"youtube": null,
"mission": [],
"overview": [],
"payment": null,
"checkins": "2,082 people checked in here",
"verified": false,
}
You can use the unwind parameter to display only the posts from your dataset on the platform, i.e.:
https://api.apify.com/v2/datasets/zbg3vVF3NnXGZfdsX/items?format=json&clean=1&unwind=posts&fields=posts,title,pageUrl
unwind will turn the posts property on the dataset to become dataset items themselves. the fields parameters makes sure to only include the fields that are important.
You can split your dataset by comment, instead of having everything nested. The following code can output one comment per dataset item:
async ({ data, item, customData, Apify }) => {
const { posts, ...pageData } = item;
return posts.flatMap((post) => {
const { postComments: { comments, ...postData }, ...restOfPost } = post;
return comments.map((comment) => {
return {
...pageData,
...postData,
...restOfPost,
...comment,
}
});
});
}
Each output item will then be flat.
You can use the extend scraper function to add more functionality to the scraper. All pages are kept in the map variable:
async ({ page, LABELS, label, request, username, map, fns, customData, Apify }) => {
if (label === 'HANDLE') {
// this is inside the handlePageFunction
const { userData } = request;
if (
userData.label === LABELS.PAGE
&& userData.sub === 'home'
) {
// add page banner information from mobile home page, like https://m.facebook.com/apifytech
await map.append(username, async (pageInfo) => {
return {
...pageInfo,
bannerUrl: await page.evaluate(() => {
return document.querySelector('.coverPhoto')?.style.backgroundImage.replace(/(url\(\"|\"\))/g, '') ?? null;
})
};
});
}
} else if (label === 'SETUP') {
// before starting the crawler
} else if (label === 'FINISH') {
// after finishing the crawler
}
}
February 20th 2:11AM, but that’s the edited date, the actual post date is February 19th 11:31AM provided on the DOMThis project adheres to semver.
README.md aren’t tagged)Apache-2.0
We use cookies
We use cookies to analyze traffic and improve your experience. You can accept or reject analytics cookies.