The mighty Google Search: We all dream of controlling—and even taming—it. Actually, we can. Venerable browser filters such as Google Hit Hider block entire domains with a click. uBlock Origin selects and blocks any page element that distracts from the main results. Less-well-known is what used to be called Google Custom Search Engine (CSE), which Google renamed Programmable Search Engine (PSE) in 2020. Each user is able to create their own PSE. They can then feed it a list of public URLs, and its Google-like search will only show results from these and from nothing else. The user can also see Image Search results from the same query, constrained in the same way. The PSE’s relevancy-ranking and speed are the same as with a Google Search.If you last looked at CSE years ago, you’ll find much has changed—and not just the name. What has not changed is that nonprofits and educational institutions can turn off Google’s text advertisements. However, doing this is no longer a simple matter of flicking a toggle and Google trusting you. Now a nonprofit or university should first register with Google for Nonprofits or Google for Education. Both of these platforms then offer links to custom PSE creation, ad-free.
GETTING STARTED
An important initial concern among readers may be the autocomplete option for queries. PSEs now get Google Search’s full range of autocomplete prompts. (This was not previously the case.) But autocomplete can currently be turned off if it’s unwanted.
How much time should you allow to create a PSE? For those familiar with the Control Panel, building and testing a small PSE may only take 2 hours. But to make a large one and do it properly may take a beginner days of reading, learning, and trial and error. There are many requirements to fulfill and pitfalls to avoid—for instance, a big initial stumbling block is meeting the tiny-upload-size requirement for each chunk of a large URL list.
CHANGES
Those who want an ad-running PSE, or to convert an old ad-free one, are currently out of luck. In April 2022, monetization was abruptly suspended, except for lucky Google users who were already running ads on at least one PSE. Google says it is “creating a new system for publishers seeking to monetize their search engines,” but it has yet to announce details or dates.
Also gone are the variant linked CSEs, a powerful way to use your own huge, self-hosted URL list (while Google just handled the query processing). Some changes are for the better. In 2019, Google released a new mobile layout of results, and in 2020, it greatly improved the layout of the Image Search results.
Undocumented recent positive changes include lifting the cap of 5,000 URLs (Google calls them “patterns,” since you can use a /*/ wild card in the path) across all of your account’s PSEs. I run five on my Google account, one at the maximum of 5,000. But in the last year, I’ve found I can start a new PSE, in addition to my earlier ones, and there I can add new URLs that would once have taken me beyond the 5,000 total.
There have been many other changes and improvements in the last few years. Some may be unwelcome—such as breadcrumb URLs on results—but they can often be reverted back in the Control Panel. Note that Google is switching users to a new tablet-centric Control Panel, which at present appears to have some very significant functionality missing, such as XML backup export and the URL pattern finder box. Hopefully, these items will be added back by the time the swap-over is enforced.
SEARCHES
Note that PSEs will require a more sophisticated search query from users than Google Search. This may usefully reduce CAPTCHA roadblocks for complex searches. But not all users will be aware of the need for some complexity. Casual users may try to test a PSE with a few words and then might be disappointed with lackluster or few results. Some user education may be required.
Your Control Panel shows top user searches. For instance, one of my PSEs recently had top searches for “hockney falco” (art history), “depression affects a business” (business studies), and “post production house” (movie production). Such user search phrases were being misused by the SEO crowd and have been removed from the API version. But they remain in the user’s Control Panel.
You could, of course, build your own Google Search equivalent and then regularly crawl your target URLs—if they’ll let you. That’s fine for a university with perhaps 100 websites and a repository, all of which you control. But many third-party websites will only allow known crawlers. Some will only allow crawling by Google. There have also been wider political changes affecting PSEs. For instance, Google services are reported to be banned in China.
RESOURCES
PSE help pages
PSE help community
PSE blog
Tip: In the main Google Search, you can handcraft a temporary mini PSE using the following:
keyword (inurl:2022) (site:wordpress.com | site:squarespace.com | site:wix.com | site:blogger.com | site:tumblr.com | site:typepad.com)