May 4, 2011

Cannot extract full article from link | CyberSEO Pro | Support Forum

Avatar

Lost password?
Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

sp_TopicIcon
Cannot extract full article from link
Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 (0 votes) 
September 7, 2023
12:14 pm
Avatar
cyberseo.mdrzn
Member
Members
Forum Posts: 9
Member Since:
September 4, 2023
sp_UserOfflineSmall Offline

Hello,

I’m trying to configure the full HTML extractor but it fails every single time, so there must be something I’m doing wrong.

Login to see the quote

But it still fails to retrieve the text? In my settings I set the container as div and the attributes as {“class” : “fp_single-article__col-sx”}

Please provide suggestions on how to fix this issue, so I’ll be able to replicate on other feeds.

Kind regards

September 7, 2023
1:00 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

This was a problem in the content extractor routine and it’s fixed now. Please update the core code of your plugin by clicking the “Update plugin to the latest version” button.

Thanks for heads up.

September 7, 2023
1:07 pm
Avatar
cyberseo.mdrzn
Member
Members
Forum Posts: 9
Member Since:
September 4, 2023
sp_UserOfflineSmall Offline

Thanks, it now works.

I see that it uses gpt3.5-16k, but in the advanced settings I can only specify “Content Spinner: OpenAI”.

How to configure which model should it use? I’d be fine with the 3.5-4k which is half as expensive.

September 7, 2023
1:12 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

The OpenAI content spinner uses gpt3.5-16k. However, I would suggest you an alternative solution: Login to see this link – it works much faster.

September 7, 2023
2:00 pm
Avatar
cyberseo.mdrzn
Member
Members
Forum Posts: 9
Member Since:
September 4, 2023
sp_UserOfflineSmall Offline

The rewritten articles by GPT turn out in english instead of the original article language, is there a way to fix this? I currently have the setting as “Do not translate.”

Haven’t seen any prompt to edit or setting to change to prevent GPT from rewriting it in english.

September 7, 2023
2:03 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

The OpenAI GPT content spinner works with English texts only: Login to see this link

That’s why I suggested you an alternative solution.

September 7, 2023
2:22 pm
Avatar
cyberseo.mdrzn
Member
Members
Forum Posts: 9
Member Since:
September 4, 2023
sp_UserOfflineSmall Offline

I used 3.5-4k before to rewrite articles and it worked great, is there no chance at all to make a change in the plugin to allow it to work with any language? I’m down to make the changes to my .php file only just to customize it. It doesn’t have to be english only.

The solution you proposed is to use this code with the bold text added I guess:

Login to see the quote
but this seems more complicated and error-prone working with HTML than using the basic spinner, since the full text of the article has been correctly extracted.
September 7, 2023
2:25 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

Unfortunately, the “in the original language” directive did not work well with GPT 3.5 when applied to short chunks of text. As far as I know, they are improving it all the time, so maybe the situation has changed. I’ll check it out.

September 7, 2023
2:54 pm
Avatar
cyberseo.mdrzn
Member
Members
Forum Posts: 9
Member Since:
September 4, 2023
sp_UserOfflineSmall Offline

Thanks for looking into it.

I’d be down to collapse all the paragraphs into one chunk of text and then feed that to GPT to rewrite in the same language. All the articles I’ll take are very short (less than 1k words) so there should be no issues with them.

Let me know as soon as you have updates.

Kind regards

September 7, 2023
4:29 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

I’d be down to collapse all the paragraphs into one chunk of text and then feed that to GPT to rewrite in the same language. All the articles I’ll take are very short (less than 1k words) so there should be no issues with them.

You can easily do that using the following article assignment on the “AI article generation” tab:

Login to see the quote

September 9, 2023
10:48 pm
Avatar
cyberseo.mdrzn
Member
Members
Forum Posts: 9
Member Since:
September 4, 2023
sp_UserOfflineSmall Offline

You can easily do that using the following article assignment on the “AI article generation” tab:

Login to see the quote

  

I’m trying to use this solution at this moment, but it doesn’t seem to work well.

Here’s the log:

Login to see the quote

Login to see this link

So I think the issue is in the highlighted red steps, when it rewrites the article.

If I copy the text after the command I highlighted in blue (so my prompt + the article text) and feed it to chatgpt (or via 3.5turbo 16k api) I get the correct response which is the article rewritten in perfect italian (link: Login to see this link)

But when the plugin does the rewriting, for some reason it comes out 50% english and 50% italian, even tho the original article is in italian and the rewrite should be in italian. Is there some prompt baked in that is overriding my prompt?

We’re almost there, it’s almost working perfectly.

September 9, 2023
10:55 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online
  1. Never spin a GPT generated article with GPT. It reduces the quality of the content. It takes extra time. It wastes your money. It doesn’t give you any profit, like adding tap water to tap water to make it taste better.
  2. Not all parts of your content can be spun with GPT. Often it says “I can’t rephrase that” or even “That’s against OpenAI’s use case policy”. So these parts of text remain unchanged and you get a 50% or even 30% ratio. This is ChatGPT and it has a lot of restrictions – ethical, political and who knows what else…
September 10, 2023
2:52 am
Avatar
cyberseo.mdrzn
Member
Members
Forum Posts: 9
Member Since:
September 4, 2023
sp_UserOfflineSmall Offline
13sp_Permalink sp_Print
0

Ok my bad, I thought the respinner and the AI generator were the same thing.

I disabled the respinner and it outputted a perfect article in the correct language.

Everything seems good so far, I’ll try different feeds now. Thanks!

September 10, 2023
2:46 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

Unfortunately, OpenAI models come with a set of constraints imposed by their developers based on their own understanding of political correctness. This can create issues for users. However, the future looks promising. Google plans to publicly release the API for their new Gemini AI, which promises to be much more powerful. Hopefully, they will learn from OpenAI’s mistakes and offer a more flexible solution. Keep an eye on Elon Musk’s company as well; they’re also doing groundbreaking research in this area. As soon as the APIs for these new projects become publicly available, they will be immediately integrated into the CyberSEO Pro plugin.

September 17, 2023
2:11 am
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

Alternatively, I would suggest using GPT-4 as a model for the OpenAI GPT content spinner. It gives almost 100% spin ratio, and the quality of the modified content is much better. But there are two unpleasant nuances: It is rather slow, and it is 10 times more expensive.

September 19, 2023
12:48 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

UPDATE: Since version 10.110 it is recommended to use GPT-3.5 Turbo Instruct as base model in the CyberSEO Pro GPT Content Spinner settings for better results.

September 27, 2023
3:52 pm
Avatar
cyberseo.mdrzn
Member
Members
Forum Posts: 9
Member Since:
September 4, 2023
sp_UserOfflineSmall Offline
17sp_Permalink sp_Print
0

CyberSEO said
UPDATE: Since version 10.110 it is recommended to use GPT-3.5 Turbo Instruct as base model in the CyberSEO Pro GPT Content Spinner settings for better results.  

I only use the AI Generator, not the Content Spinner.

I’m getting a new kind of issue at the moment, where it’s not getting the full article to rewrite:

Login to see the quote

The plugin only extracts the intro abstract from the URL above, instead of the full article. So the “generated article” is really short and 100% identical, there has been no rewrite at all: Login to see this link dot com/2023/09/27/el-jefe-shakira-e-la-tata-che-ha-scoperto-il-tradimento-di-pique-testo-traduzione-e-significato/

September 27, 2023
6:15 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

You can use GPT-3.5 Turbo Instruct for the AI article generator as well as in your HTML post templates.

Your feed already contains full-text articles, so you don’t need to extract them with container tag. The article can also be extracted using the Full-Text RSS script.

As mentioned above, GPT-3.5 is not so smart about rewriting HTML content (I would say not always and not at all). GPT-4 can do it easily, but GPT-3.5 might not.

You can always test your GPT assignments (prompts) with any of OpenAI’s GPT models here: Login to see this link

Simply select the appropriate model, write your prompt, and copy your source article in HTML format. This way you can evaluate the capabilities of each GPT model and see how it works with your content. If it works there, it will work the same way with CyberSEO Pro.

September 27, 2023
7:00 pm
Avatar
cyberseo.mdrzn
Member
Members
Forum Posts: 9
Member Since:
September 4, 2023
sp_UserOfflineSmall Offline
19sp_Permalink sp_Print
0

CyberSEO said
You can use GPT-3.5 Turbo Instruct for the AI article generator as well as in your HTML post templates.

Your feed already contains full-text articles, so you don’t need to extract them with container tag. The article can also be extracted using the Full-Text RSS script.

The feed doesn’t contain the full article (Login to see this link) but I extract it as you can see from the logs above and repeated here:

[27-09-23 01:46:46] Attributes specified: {“class”: “fp_single-article__content”}

So it extracts the full column with the full article, but only rewrites the abstract. In the logs, it sometimes stops whenever it encounter the taboola script tag:

Login to see the quote

So in this case it didn’t extract the full text. Check the website and the tag fp_single-article__content and you’ll see that it contains the full article. Is this fixable? Sometimes it happens sometimes it doesn’t, not sure why or if it’s being loaded only on some articles.

 

As mentioned above, GPT-3.5 is not so smart to rewrite HTML content. GPT-4 can do it easily, but not GPT-3.5 yet.

You can always test your GPT assignments (prompts) with any of OpenAI’s GPT models here: Login to see this link

Simply select the appropriate model, write your prompt, and copy your source article in HTML format. This way you can evaluate the capabilities of each GPT model and see how it works with your content. If it works there, it will work the same way with CyberSEO Pro.  

The rewriting seems to work fine when it gets the full article, it fails to rewrite when it’s less than 100/200 words. In the playground and in chatGPT it works kinda differently from the API/plugin, even for the same prompts, but there’s nothing we can do about it. As long as I get the full article I’m fine.

Also I need to remove (whenever it’s there) a string that gets copied from the original article similar to “Click here to read more”. Is there a filter for that (which searches for a specific string and removes it), or the only fix is via editing the prompt?

Also, if I may request a new feature, a “clone feed” button so it keeps all the same settings but lets me put a new url would be cool, otherwise it’s a lot of repeating the same settings manually.

September 27, 2023
7:55 pm
Avatar
CyberSEO
Admin
Forum Posts: 3690
Member Since:
July 2, 2009
sp_UserOnlineSmall Online

I see the problem. Your post assignment is incorrect. The full-text article is already in the feed and should not be extracted. Just disable the full text article extraction at all. Your articles are short and they have no formatting, images, etc. because you forgot to ask OpenAI GPT to return the result in HTML format, which is the main point of this manual: Login to see this link

Here is the fixed version of your “Article assignment” (not sure for my Italian though ;)):

Login to see the quote

Use it and you’ll get the right result.

P.S. Always check your assignments at Login to see this link – it really helps. Of course, it will never return the same result, because every generated response is unique, but you can see how it really works. Please don’t ignore this practice.

Forum Timezone: Europe/Amsterdam

Most Users Ever Online: 541

Currently Online: CyberSEO, SKMIT
4 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

ninja321: 84

s.baryshev.aoasp: 64

Freedom: 61

MediFormatica: 49

B8europe: 47

saviulisse67: 45

Member Stats:

Guest Posters: 338

Members: 2656

Moderators: 0

Admins: 1

Forum Stats:

Groups: 1

Forums: 5

Topics: 1542

Posts: 7782

Newest Members:

SKMIT, kellyslrm, pu.analytics.ee, frajaros, federico_pac, rexfordkent

Administrators: CyberSEO: 3689