May 4, 2011

Suggestion - extracting full text article after duplicate check | CyberSEO Pro | Support Forum

Avatar

Lost password?
Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

sp_TopicIcon
Suggestion - extracting full text article after duplicate check
Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 (0 votes) 
December 23, 2022
2:07 pm
Avatar
MediFormatica
Member
Members
Forum Posts: 49
Member Since:
December 17, 2022
sp_UserOfflineSmall Offline

Hi,

This is just a suggestion for your consideration. I noticed in the log that for the feeds where I enabled full text extracts the script would first get the full text then do the duplicate and then skip the feed item.

Login to see the quote

Performance wise, that’s no big deal. It takes 1 second to complete the step. It may add up though. Suppose I’m checking the feed for 10 articles (my default) and most or all of them will be duplicates. That’s 8-9 extra seconds. If this check is for multiple feeds on this run, say 4 feeds, then this is adding 30-40 seconds, which again is no big deal but still helps with timeouts.

Performance aside, the full text extract is an external script called by CyberSEO, so having less calls to a 3rd party script would definitely improve the stability of the product.

I’m not sure if it is the same for user-defined PHP code. Would also be great to check for duplicates before determining whether the code should run.

Login to see the quote

Anyway, its just a thought I had. Of course your code may be dependent on the full text extract for the duplicate check, in which case this suggestion wouldn’t be applicable.

December 23, 2022
2:19 pm
Avatar
CyberSEO
Admin
Forum Posts: 3709
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

The Full-Text RSS script gets galled before custom PHP script, which needs an access to the extracted article in order to allow you to modify it. Your custom PHP script may modify not just the article itself, but also it’s title and even it’s GUID which is necessary in some cases.

Thus only after the PHP script has execution, the plugin can do a duplicate check based on the post title and it’s GUID which can be modified after the full text extraction.

I think I can add a check for the PHP code contents and and if it’s empty I can do a duplication check before the Full-Text RSS script, but I need to check the code first to be sure it won’t hurt any other processes.

December 23, 2022
2:59 pm
Avatar
MediFormatica
Member
Members
Forum Posts: 49
Member Since:
December 17, 2022
sp_UserOfflineSmall Offline

Hmmm, got it. I hadn’t considered changing titles, but you’re right, that would require the duplicate check to be done at the end.

December 23, 2022
4:36 pm
Avatar
CyberSEO
Admin
Forum Posts: 3709
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

I’ve implemented this improvement in version 10.002. The post duplicate check will be performed prior to the Full-Text RSS feed script execution if the custom PHP code is empty and Article Forge is not used (it may also change the post titles).

December 23, 2022
5:35 pm
Avatar
MediFormatica
Member
Members
Forum Posts: 49
Member Since:
December 17, 2022
sp_UserOfflineSmall Offline

That’s cool sf-punk

I’ve just updated the plugin now.

December 23, 2022
6:03 pm
Avatar
MediFormatica
Member
Members
Forum Posts: 49
Member Since:
December 17, 2022
sp_UserOfflineSmall Offline

OK

tried it and I feel it is way faster than I thought it would be.

I tried it with 5 feeds (all have no new posts) each setup to pull 10 posts, and it went through it in 20 seconds only!!!!

December 23, 2022
6:06 pm
Avatar
CyberSEO
Admin
Forum Posts: 3709
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

Nice to hear that the improvement was effective.

Forum Timezone: Europe/Amsterdam

Most Users Ever Online: 541

Currently Online:
24 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

ninja321: 84

s.baryshev.aoasp: 64

Freedom: 61

MediFormatica: 49

B8europe: 47

saviulisse67: 45

Member Stats:

Guest Posters: 338

Members: 2667

Moderators: 0

Admins: 1

Forum Stats:

Groups: 1

Forums: 5

Topics: 1549

Posts: 7829

Newest Members:

mark.frontiercreative, fairriverllc, josepatricioperalta, goran.o.aroga, betomanzoli, monefff

Administrators: CyberSEO: 3709