Make a Lasting Impression





how to filter html from content for search results
Home away from home
Joined:
2009/3/3 4:18
From Belgium
Posts: 1944
Hi, I'm upgrading my search results to show the beginning of the text of a page as part of the search result. For example, for the content module, I change the file include/search.php to take into account the text field by adding
Quote:
$item['description'] = $contentArray['content_description'];

to the assignments.

Next to that, I added a reference to the description field in the system/template/system_search.html template, like this:
Quote:
<{$search_results[$sort_key].results[cur_result].description|truncate:250:"..."}>


The problem is that there is HTML in the content that doesn't always work well. I'd like to filter that out. Do I use the HTMLPurifier for that?

Posted on: 1/18 14:16:38
_________________
d-log - My personal site
Openhub profile
Transfer the post to other applications Transfer


Re: how to filter html from content for search results
Home away from home
Joined:
2007/12/4 9:00
Posts: 1199
This might work.

If you want to get rid of everything, but perhaps keep a few specific tags like paragraph or links, you could use strip_tags() with the optional second argument, which allows you to whitelist the tags you want to keep. Its not a very reliable function though as it tends to chop everything as soon as it sees a <

I think this is a bit of a problem with the way some of the modules (including my own) are structured though. We really need to have separate teaser and description fields, and if there is an image associated with the teaser or description it would be a lot better if it was uploaded as a separate field, rather than being embedded as a html tag inside the teaser.

I will address this in a new version of library, which will replace all my content modules.

Posted on: 1/19 2:50:55
Transfer the post to other applications Transfer


Re: how to filter html from content for search results
Home away from home
Joined:
2009/3/3 4:18
From Belgium
Posts: 1944
Hm, I applied the strip_tags to the variable that I'm outputting in the search.inc.php, and still the paragraph elements are shown. Perhaps I'm just filtering the wrong variable, or not using the filtered version

the basis seems sound, though.

Posted on: 1/20 8:51:23
_________________
d-log - My personal site
Openhub profile
Transfer the post to other applications Transfer


Re: how to filter html from content for search results
Home away from home
Joined:
2007/12/4 9:00
Posts: 4237
There is an IPF method in icms_ipf_Metagen - look at setDescription - that is used to generate the description (no HTML) from the body. It is a multiple step process - first convert to plain text, then truncate.

This might be the best option, because in addition to HTML, there can be ICMS codes and tags within the body that would also need to be accounted for.

I remember a discussion in forums a long time ago regarding substr and HTML tags that get split.

Posted on: 1/22 21:00:04
_________________
Steve
Twitter: @skenow
Facebook: Steve Kenow
Transfer the post to other applications Transfer


Re: how to filter html from content for search results
Home away from home
Joined:
2009/3/3 4:18
From Belgium
Posts: 1944
Yes, that will be a better option, you are right about the icms codes and such...

Thanks for the hint!

Posted on: 1/30 14:56:12
_________________
d-log - My personal site
Openhub profile
Transfer the post to other applications Transfer


Re: how to filter html from content for search results
Home away from home
Joined:
2007/12/4 9:00
Posts: 1199
Speaking of bb codes etc, what do you think we should do with these? Keep supporting them, phase them out? Are they still relevant, if we have HTMLPurifier?

I'm cleaning out my 15 year old work site and it's pretty horrible. My oldest "xoops" era content was just plain text with BB codes and linebreaks paragraphs, later on it's BB codes + html, and then a fair while back I moved to pure html. I'm converting it all to html now but it's literally taken me a couple of months, and I probably still have a couple of weeks to go.

Posted on: 2/3 3:54:30
Transfer the post to other applications Transfer


Re: how to filter html from content for search results
Home away from home
Joined:
2007/12/4 9:00
Posts: 4237
There are some benefits to bb codes - especially when they provide functionality that isn't just shortcuts to HTML, like wiki links, hashtags, mentions. In most(?) cases, the bb codes are not preserved when saved to the database, they are converted to the equivalent HTML markup, if you are using the WYSIWYG editors.

I know when I was converting the wiki from MediaWiki to SimplyWiki, it was a total pain - and had to spend a lot of time adapting and writing conversion routines. bb codes and markdown are not portable and can will lead to a lot of effort at some point.

Posted on: 2/5 1:35:44
_________________
Steve
Twitter: @skenow
Facebook: Steve Kenow
Transfer the post to other applications Transfer


Re: how to filter html from content for search results
Home away from home
Joined:
2009/3/3 4:18
From Belgium
Posts: 1944
I'm not in favor of storing a transformed version of what the user has entered in the WYSIWYG editor. What the user enters, must be preserved.

Take the case where you apply filtering to a field, based on the role the user has (common in the site we work on), when a user is promoted from author to a webmaster role, which comes with less filtering, you don't want him to edit all his previous postings.

The case for filtering on storing the content is not a good one if you take into account that a normal production site is using caching, so the heavy work (applying filtering to the content) is done once, when you put the page in cache.

On topic : BBCodes are old skool, but they have an offspring in our own 'custom tags' functionality. That is a powerful feature that isn't used as much as it could.

It depends heavily on your type of user and site. Some sites have technical users that know their way in HTML, for them it is no problem to have to encode everything in the editor. If your site is using a complex design, they must be able to apply the right classes to the different DIVs in order to get it right.

Other sites have users that want to enter text, and that want the system to transform that text into the right layout. Those users don't know about HTML, don't know about DIVS but always want the most complex designs For them, BBcodes might be too much, but markdown might be a compromise.

Posted on: 2/6 7:50:40
_________________
d-log - My personal site
Openhub profile
Transfer the post to other applications Transfer






You can view topic.
You cannot start a new topic.
You cannot reply to posts.
You cannot edit your posts.
You cannot delete your posts.
You cannot add new polls.
You cannot vote in polls.
You cannot attach files to posts.
You cannot post without approval.

[Advanced Search]